Back to Integrations
Logic

Remove Duplicates

The Remove Duplicates node is an essential data-cleaning tool that intelligently scans incoming data arrays and instantly purges identical or redundant items. Whether you are dealing with a massive list of scraped emails, repetitive database rows, or merged API responses, this node guarantees clean, unique datasets.

Remove Duplicates
Core / Logic

What can you do with Remove Duplicates?

Deep Object Comparison

Compare entire complex JSON objects securely. The node hashes and analyzes deeply nested structures to ensure absolute uniqueness without tedious manual configurations.

Specific Field Targeting

Focus your deduplication on specific keys. Isolate duplicate users by checking just the 'email' or 'data.user_id' fields while safely ignoring other fluctuating metadata.

Enterprise Performance

Engineered with O(N log N) index-tracking memory algorithms. Clean millions of records instantly without triggering out-of-memory errors or server crashes.

Detailed Usage & Configuration

1. Deduplication Guide & Core Concepts

The Remove Duplicates node acts as an automated gatekeeper, scanning incoming data arrays and stripping away redundant elements. This operation is absolutely critical before performing batch insertions into databases like MySQL or dispatching mass email campaigns.

Configuring Compare Modes

  • All Fields (Full Object Match): Analyzes every single byte of the JSON structure. If two objects are identical across all keys and values, the redundant instance is safely deleted.
  • Specific Fields (Targeted Match): Instructs the engine to only verify specific keys (e.g., email or phone_number). The system successfully catches duplicates even if their metadata (like timestamps) differs.

2. Advanced Deduplication Features

🚀 Exclusive: Keep Preference (First or Last)

Unlike standard automation platforms, nLink allows you to dictate exactly which item survives: Keep First (retains the original appearance) or Keep Last (keeps the most recently updated item). All surviving data absolutely maintains 100% of its original chronological order!

  • Ignore Case Sensitivity: When activated, the engine forces strings like Admin@gmail.com and admin@gmail.com into a normalized state, ensuring absolute accuracy when identifying identical users.
  • Dot Notation Support: Reach deep into complex, multi-layered data arrays using dot syntax (e.g., data.customer.id).

3. Frequently Asked Questions (FAQ)

Will this node cause Out-Of-Memory (OOM) crashes with millions of rows?

Absolutely not. The Remove Duplicates engine is heavily optimized by nLink architects using 16-byte MD5 crypto hashes and O(N log N) memory-index tracking. It consumes up to 99% less RAM than traditional caching systems.

Can I route the deleted duplicate items to another node?

By design, this node acts as a destructive filter to ensure main-thread purity. If you need to route duplicates into a separate warning system (like Slack), we recommend evaluating array counts using the If / Else Node.