Working with Puppeteer for Data Scraping from websites in Node.Js is very easy. However it works easily on Windows but sometimes causes problem running on Linux (GUI less servers).
Recently I worked on a project in which integrated Puppeteer to scrap data. It worked fine on Windows (however I wrote my code on Windows machine). But when I deployed it to AWS EC2 Ubuntu 16.04 Cloud Server
, it didn’t run. There was error something like Not able to start Chromium.
I invested my 2 days of time around it and the fix was pretty easy.
All you have to pass is --no-sandbox
in the args while Puppeteer launch.
Steps are below:
1. Install Puppeteer
npm i puppeteer
2. Install following dependencies
sudo apt-get install gconf-service libasound2 libatk1.0-0 libatk-bridge2.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
3. Now use below code
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({args: ['--no-sandbox']}); const page = await browser.newPage(); await page.goto('https://google.com'); await page.screenshot({path: 'google.png'}); await browser.close(); })();
Conclusion
Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. We learnt about how to install and use Puppeteer in Ubuntu.