Web Crawler
The Web Crawler node is a powerful data extraction tool that visits URLs, evaluates the DOM, and extracts precise information using CSS selectors. Turn unstructured web pages into structured JSON data effortlessly for monitoring competitors, aggregating news, or scraping product prices.
What can you do with Web Crawler?
CSS Selector Engine
Surgically target exact elements on a complex page (like a product price tag, an article title, or an image source) using familiar, standard CSS selectors.
Automated List Extraction
Pull robust, repeating lists of data (like all hyper-links on a sitemap, or all products in a retail category) and output them flawlessly as individual workflow items.
Headless DOM Browsing
Highly capable of parsing structured modern web pages and returning the clean, parsed text or HTML attributes directly to your subsequent processing nodes.
Detailed Usage & Configuration
The Web Crawler node provides targeted data extraction without the overhead of massive scripting tools like Puppeteer. It requests a URL, renders the HTML DOM, and lets you query elements exactly like jQuery.
1. Configuring Selectors
Once you input a target URL, define an output property name and its corresponding CSS Selector:
h1.article-titleextracts the main header text..product-priceextracts the numeric pricing string.img.main-imagepaired with the Return Attributesrcextracts the image URL instead of its text.
2. Returning Arrays & Lists
By toggling "Return Array", a single selector like ul.nav-menu li a will output a dense Array of all 20 link names found, rather than just the first one. Combine this natively with the Loop Node to systematically iterate through all the links extracted.
3. Real-World Limitations
This crawler retrieves static HTML markup returned by the server upon the initial request. It does not execute client-side JavaScript. If the target website heavily relies on React/Vue to lazily render data after the page loads, the crawler won't "see" that data. Always verify the raw page source first.
