DH
2 min read

How to fetch authenticated CSV’s with Google Chrome (Headless) in NodeJS

How to fetch authenticated CSV’s with Google Chrome (Headless) in NodeJS.

puppeteerchromeheadlessnodejs

Recently I had a use case where we need to log in to a third-party site and fetch protected CSV files. While PhantomJS (via CasperJS) can often accomplish this take, we have had issues with its stability often crashing, especially in Docker.

But that’s okay because Google Chrome (headless) is up to the task and has quite an awesome, simple API to use.

Here is a NodeJS sample using puppeteer to interact with Google Chrome (Headless). First, we need to install the puppeteer module.

npm install puppeteer --save

Once the module has been installed create a Javascript file with the following contents.

const puppeteer = require('puppeteer');
// variables
const USER = '[email protected]';
const PASS = 'password';

(async () => {
// create a browser instance
// use the --no-sandbox and --disable-setuid-sandbox parameters depending on your kernel support
const browser = await puppeteer.launch({ executablePath: '/usr/bin/chromium-browser', headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu'] });
// create a page instance
const page = await browser.newPage();

// export the browser and page variables so we can debug using Chrome Developer Tools
Object.assign(global, { browser, page });

// output script console messages in our terminal console
page.on('console', msg => console.log(`chrome[${msg.text()}]`));

// connect to the site
await page.goto('https://site.com/users/sign_in', { waitUntil: 'networkidle0' });

// login to the site
await page.click('#user_email');
await page.keyboard.type(USER);
await page.click('#user_password');
await page.keyboard.type(PASS);
await page.click('#new_user input[type="submit"]');
await page.waitForNavigation();

// now we are going to tell the Chrome instance to use the fetch() function to download the content for us.
// be sure to include the credentials so that any cookies and session variables are passed through and then
// the downloaded content will be returned to our NodeJS script.
const downloadUrl = 'https://site.com/fetch/file.csv';
const downloadedContent = await page.evaluate(async downloadUrl => {
const fetchResp = await fetch(downloadUrl, { credentials: 'include' });
return await fetchResp.text();
}, downloadUrl);

console.log(`Downloaded: ${downloadedContent}`);

await browser.close();
})();

Now we just need to run our sample and watch the output.

node test.js
Damian Hodgkiss

Damian Hodgkiss

Senior Staff Engineer at Sumo Group, leading development of AppSumo marketplace. Technical solopreneur with 25+ years of experience building SaaS products.

Creating Freedom

Join me on the journey from engineer to solopreneur. Learn how to build profitable SaaS products while keeping your technical edge.

    Proven strategies

    Learn the counterintuitive ways to find and validate SaaS ideas

    Technical insights

    From choosing tech stacks to building your MVP efficiently

    Founder mindset

    Transform from engineer to entrepreneur with practical steps