Delimiter Extractor Plugin
Parse text files with custom delimiters and automatically extract structured data for import into Flatfile
The Delimiter Extractor plugin is designed to parse text files that use non-standard delimiters to separate values. Its primary purpose is to automatically detect and extract structured data from these files when they are uploaded to Flatfile. It supports a variety of single-character delimiters such as ;
, :
, ~
, ^
, and #
.
This plugin is useful in scenarios where data is provided in custom formats that are not natively handled by the Flatfile platform’s standard CSV, TSV, or PSV parsers. It operates within a server-side listener, triggering on the file:created
event to process the file, identify headers, and structure the data into records for import.
Installation
Install the plugin using npm:
Configuration & Parameters
Required Parameters
The file extension (e.g., “.txt”, “.dat”) that the plugin should listen for. This is the first argument to the DelimiterExtractor
function.
The character used to separate values in the file. Supported delimiters are ;
, :
, ~
, ^
, #
.
Optional Parameters
If set to true
, the plugin will attempt to convert numeric and boolean strings into their corresponding types. For example, “123” becomes 123
and “true” becomes true
.
Controls how empty lines in the file are handled:
true
: Skips lines that are completely empty'greedy'
: Skips lines that contain only whitespace charactersfalse
: Includes all lines, even empty ones
A function that is applied to each individual cell value during parsing. The return value of the function will replace the original value. This is applied before dynamicTyping
.
The number of records to process in each batch or chunk when inserting data into Flatfile.
The number of chunks to process concurrently.
An advanced configuration object to control the header detection strategy. Allows for specifying explicit headers, looking for headers in specific rows, or using different detection algorithms. Default uses the ‘default’ algorithm, which selects the row with the most non-empty cells within the first 10 rows as the header.
An array of delimiter characters to try if a specific delimiter
is not provided. The parser will use the first one that successfully parses the data.
Enables debug logging.
Usage Examples
Basic Usage
Configure the listener to use the plugin for any .txt
file, specifying that the data is separated by a colon:
Advanced Configuration
This example shows a more detailed configuration for .data
files with type conversion, empty line handling, and value transformation:
Custom Header Detection
This example demonstrates how to use advanced header detection options to explicitly define headers:
Direct Buffer Parsing
For advanced use cases where you need to parse a buffer directly:
Troubleshooting
No Data Appears After Upload
If a file is uploaded but no data appears, check the following:
- File Extension: Ensure the file extension matches the one configured in
DelimiterExtractor(fileExt, ...)
- Delimiter: Verify that the
delimiter
option matches the actual delimiter used in the file - Empty Files: If the file is empty or contains no parsable data, the plugin will log “No data found in the file” to the console and produce no records
Unsupported File Types Error
Notes
Limitations
- The plugin explicitly does not support file types that are natively handled by Flatfile:
.csv
(comma-separated),.tsv
(tab-separated), and.psv
(pipe-separated) - The list of supported delimiters is fixed to:
;
,:
,~
,^
,#
- This plugin is intended to be run in a server-side listener environment within the Flatfile Platform
Error Handling
- The main
DelimiterExtractor
function includes a guard clause that throws anError
if an unsupported native file type is provided - The internal parsing function uses try-catch blocks to handle parsing errors, which are logged to the console and re-thrown, causing the associated Flatfile job to fail
Default Behavior
- By default, the plugin does not perform type conversion (
dynamicTyping: false
) - Empty lines are included in the output unless explicitly configured otherwise (
skipEmptyLines: false
) - The plugin processes 10,000 records per chunk with no parallel processing (
chunkSize: 10000
,parallel: 1
) - Header detection uses the ‘default’ algorithm, selecting the row with the most non-empty cells within the first 10 rows