Headless Chrome: A powerful tool for automated web testing and crawling
Core points
- Starting with Chrome version 59 (version 60 for Windows users), headless Chrome allows you to programmatically simulate user interaction with websites and capture results for testing. It uses Chromium and Blink engines to simulate the user experience in Chrome.
- Running headless Chrome in Node.js requires the
chrome-remote-interface
module (for simplifying abstraction of commands and notifications) and thechrome-launcher
module (for launching Chrome from Node.js across multiple platforms). - After initializing the session and defining the test domain, you can navigate the website, copy user journeys, and capture results. You can also use the
captureScreenshot
function to capture page screenshots while navigating the website. - While headless Chrome is not fully integrated into tools like Selenium, due to its ability to render JavaScript, it is the best way to reproduce the user experience in a fully automated way, ideal for large-scale automated web crawling tasks .
In our work, it is often necessary to replicate user journeys repeatedly to ensure that the page provides a consistent experience when changing the website. The key to achieving this is to allow us to write libraries of these test scripts so that we can run assertions on them and maintain the result documentation. This is what the headless browser does: a command-line tool that allows you to programmatically simulate user interaction with your website and capture results for testing.
For many years, many people have been using PhantomJS, CasperJS and other tools to do this. But, just like love, our hearts may be transferred elsewhere. Starting with Chrome version 59 (version 60 for Windows users), Chrome comes with its own headless browser. While it does not support Selenium at the moment, it uses Chromium and Blink engines, that is, it simulates the actual user experience in Chrome.
The code for this article can be found in our GitHub repository.
Run headless Chrome from the command line
Running headless Chrome from the command line is relatively easy. On a Mac, you can set an alias for Chrome and run it with the --headless
command line parameter:
alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome" chrome --headless --disable-gpu --remote-debugging-port=9090 https://www.sitepoint.com/
On Linux, it's even easier:
google-chrome --headless --disable-gpu --remote-debugging-port=9090 https://www.sitepoint.com/
--headless
: No UI required or display server dependencies running--disable-gpu
: Disable GPU hardware acceleration. This parameter is currently required.--remote-debugging-port
: Enable remote debugging over HTTP on the specified port.
You can also interact with the requested page, for example, to print document.body.innerHTML
to standard output, you can do the following:
google-chrome --headless --disable-gpu --dump-dom http://endless.horse/
If you are curious about the possibility, you can find the complete list of parameters here.
Run headless Chrome in Node.js
However, the focus of this article is not on the command line, but on how to run headless Chrome in Node.js. To do this, we need the following module:
chrome-remote-interface
: The JavaScript API provides a simple abstraction of commands and notifications.chrome-launcher
: Allows us to launch Chrome in Node.js on multiple platforms.
Then we can set up our environment. This assumes that Node.js and npm are installed on your machine. If this is not the case, check out our tutorial.
alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome" chrome --headless --disable-gpu --remote-debugging-port=9090 https://www.sitepoint.com/After that, we want to instantiate a session using headless-chrome. Let's start by creating a
file in the project folder: index.js
google-chrome --headless --disable-gpu --remote-debugging-port=9090 https://www.sitepoint.com/First, we are introducing dependencies and then creating a self-call function that will instantiate the Chrome session. Note that the
flag is required at the time of writing, but may not be needed when you read this, as it is just a workaround (as Google recommends). We will use --disable-gpu
to make sure our application waits for the headless browser to start before performing the next steps. async/await
google-chrome --headless --disable-gpu --dump-dom http://endless.horse/The most important Page object here - we will use it to access the content rendered to the UI. This will also be where we specify navigation locations, interactive elements, and where we run the script.
Explore Page
After initializing the session and defining the domain, we can start navigating the website. We want to select a starting point, so we use the Page domain enabled above to navigate:
mkdir headless cd headless npm init -y npm install chrome-remote-interface --save npm install chrome-launcher --saveThis will load the page. We can then use the
method to define the steps to run the application to execute the code to copy our user journey. In this example, we just get the content of the first paragraph: loadEventFired
const chromeLauncher = require('chrome-launcher'); const CDP = require('chrome-remote-interface'); (async function() { async function launchChrome() { return await chromeLauncher.launch({ chromeFlags: [ '--disable-gpu', '--headless' ] }); } const chrome = await launchChrome(); const protocol = await CDP({ port: chrome.port }); // 所有后續(xù)代碼片段都位于此處 })();If you run the script using
, you should see results similar to the following output: node index.js
Go a step further - grab screenshot
This is good, but we can just as easily replace any code with a value to use the query selector to click links, fill in form fields, and run a series of interactions. Each step can be stored in a JSON configuration file and loaded into your Node.js script to execute in sequence. The results of these scripts can be verified using test platforms such as Mocha, allowing you to cross-reference whether the captured values ??meet UI/UX requirements. script1
function that does this accurately. captureScreenshot
const { DOM, Page, Emulation, Runtime } = protocol; await Promise.all([Page.enable(), Runtime.enable(), DOM.enable()]);
logo is another logo that needs to be supported across platforms at the time of writing, and may not be needed in future iterations. fromSurface
and you should see results similar to the following output: node index.js
Conclusion
If you are writing automation scripts, you should now start using Chrome's headless browser. While it still doesn't fully integrate into tools like Selenium, the benefits of simulating Chrome rendering engines cannot be underestimated. This is the best way to reproduce the user experience in a fully automated way.
I will provide you with some further reading materials:
- API Documentation: http://m.miracleart.cn/link/fc56459a18776e2a100854c16a1fd78b
- Beginner of headless Chrome: http://m.miracleart.cn/link/ada77e9fac537039c9adb2787b9af7da
Please tell me your experience with headless Chrome in the comments below.
(The FAQs part is omitted here because it is repeated with the original text and is too long. The FAQs content can be optionally retained or reorganized as needed.)
The above is the detailed content of Quick Tip: Getting Started with Headless Chrome in Node.js. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

There are three common ways to initiate HTTP requests in Node.js: use built-in modules, axios, and node-fetch. 1. Use the built-in http/https module without dependencies, which is suitable for basic scenarios, but requires manual processing of data stitching and error monitoring, such as using https.get() to obtain data or send POST requests through .write(); 2.axios is a third-party library based on Promise. It has concise syntax and powerful functions, supports async/await, automatic JSON conversion, interceptor, etc. It is recommended to simplify asynchronous request operations; 3.node-fetch provides a style similar to browser fetch, based on Promise and simple syntax

JavaScript data types are divided into primitive types and reference types. Primitive types include string, number, boolean, null, undefined, and symbol. The values are immutable and copies are copied when assigning values, so they do not affect each other; reference types such as objects, arrays and functions store memory addresses, and variables pointing to the same object will affect each other. Typeof and instanceof can be used to determine types, but pay attention to the historical issues of typeofnull. Understanding these two types of differences can help write more stable and reliable code.

Hello, JavaScript developers! Welcome to this week's JavaScript news! This week we will focus on: Oracle's trademark dispute with Deno, new JavaScript time objects are supported by browsers, Google Chrome updates, and some powerful developer tools. Let's get started! Oracle's trademark dispute with Deno Oracle's attempt to register a "JavaScript" trademark has caused controversy. Ryan Dahl, the creator of Node.js and Deno, has filed a petition to cancel the trademark, and he believes that JavaScript is an open standard and should not be used by Oracle

CacheAPI is a tool provided by the browser to cache network requests, which is often used in conjunction with ServiceWorker to improve website performance and offline experience. 1. It allows developers to manually store resources such as scripts, style sheets, pictures, etc.; 2. It can match cache responses according to requests; 3. It supports deleting specific caches or clearing the entire cache; 4. It can implement cache priority or network priority strategies through ServiceWorker listening to fetch events; 5. It is often used for offline support, speed up repeated access speed, preloading key resources and background update content; 6. When using it, you need to pay attention to cache version control, storage restrictions and the difference from HTTP caching mechanism.

Promise is the core mechanism for handling asynchronous operations in JavaScript. Understanding chain calls, error handling and combiners is the key to mastering their applications. 1. The chain call returns a new Promise through .then() to realize asynchronous process concatenation. Each .then() receives the previous result and can return a value or a Promise; 2. Error handling should use .catch() to catch exceptions to avoid silent failures, and can return the default value in catch to continue the process; 3. Combinators such as Promise.all() (successfully successful only after all success), Promise.race() (the first completion is returned) and Promise.allSettled() (waiting for all completions)

JavaScript array built-in methods such as .map(), .filter() and .reduce() can simplify data processing; 1) .map() is used to convert elements one to one to generate new arrays; 2) .filter() is used to filter elements by condition; 3) .reduce() is used to aggregate data as a single value; misuse should be avoided when used, resulting in side effects or performance problems.

JavaScript's event loop manages asynchronous operations by coordinating call stacks, WebAPIs, and task queues. 1. The call stack executes synchronous code, and when encountering asynchronous tasks, it is handed over to WebAPI for processing; 2. After the WebAPI completes the task in the background, it puts the callback into the corresponding queue (macro task or micro task); 3. The event loop checks whether the call stack is empty. If it is empty, the callback is taken out from the queue and pushed into the call stack for execution; 4. Micro tasks (such as Promise.then) take precedence over macro tasks (such as setTimeout); 5. Understanding the event loop helps to avoid blocking the main thread and optimize the code execution order.

Event bubbles propagate from the target element outward to the ancestor node, while event capture propagates from the outer layer inward to the target element. 1. Event bubbles: After clicking the child element, the event triggers the listener of the parent element upwards in turn. For example, after clicking the button, it outputs Childclicked first, and then Parentclicked. 2. Event capture: Set the third parameter to true, so that the listener is executed in the capture stage, such as triggering the capture listener of the parent element before clicking the button. 3. Practical uses include unified management of child element events, interception preprocessing and performance optimization. 4. The DOM event stream is divided into three stages: capture, target and bubble, and the default listener is executed in the bubble stage.
