Extract From File
The Extract From File node is the ultimate data ingestion engine for your No-code Automation infrastructure. It aggressively tackles the pain of messy unstructured data by effortlessly parsing massive spreadsheets and unrolling deep XML/JSON trees. Engineered with strict memory-safe streaming, it guarantees zero Out-Of-Memory crashes even during heavy data processing.
What can you do with Extract From File?
Universal Format Parsing
Seamlessly transform unstructured <strong>CSV, XML, JSON, PDF</strong>, and <strong>Excel</strong> files directly into clean JSON arrays, instantly saving hours of manual data entry.
Enterprise-Grade Stability
Aggressively process massive 100MB files without crashing your workflow. Our <strong>True Zero-RAM Streaming</strong> architecture bypasses memory entirely, guaranteeing absolute execution reliability.
Surgical HTML Scraping
Effortlessly scrape precise datasets from raw <strong>HTML</strong> files. Input standard <strong>CSS Selectors</strong> to instantly output structured arrays of pricing, text, or image links.
Detailed Usage & Configuration
The Extract From File node serves as the definitive data processor whenever your automation workflow ingests external files. It acts as an invincible bridge between messy raw files and structured JSON arrays.
1. Handling Input Sources
You can seamlessly ingest data from three distinct origins:
- File Path: The absolute most optimal method. Provide a direct path to a file. The node will exploit native OS read-seeking to process massive files without risking memory bloat.
- Base64 Content: Perfect for binary files downloaded dynamically via upstream HTTP Request nodes. The engine securely decodes the payload on the fly.
- Raw String: Directly inject raw XML, JSON, or CSV text payloads natively captured from Webhook triggers.
2. Format-Specific Optimizations
The execution engine adapts intelligently based on the chosen format:
- Spreadsheets (CSV/XLSX): Toggling
hasHeaderautomatically maps the first spreadsheet row as precise JSON keys (e.g.,[{"Email": "user@mail.com"}]), outputting a pristine array ready for downstream Loop nodes. - PDF Documents: The node aggressively scans every page of a PDF document, surgically separating and returning the raw text cleanly labeled by its original Page Number.
- HTML Scraping: Unleash the built-in CSS Selector engine to scrape targeted DOM elements. For example, setting the selector to
h2.article-titlewill generate an instant array of all matching article headers.
