Excel Extractor

The Excel Extractor plugin is designed to parse various Excel file formats (.xls, .xlsx, .xlsm, .xlsb, .xltx, .xltm) and extract structured data from them. When a user uploads a supported Excel file, this plugin automatically processes it, detects headers, and transforms the sheet data into a format that Flatfile can use. It offers extensive configuration for handling complex Excel files, including options for header detection, processing merged cells, and cascading data in hierarchical spreadsheets. This plugin is intended to be used in a server-side listener within the Flatfile platform.

Installation

Install the Excel Extractor plugin using npm:

npm install @flatfile/plugin-xlsx-extractor

Configuration & Parameters

The Excel Extractor accepts the following configuration options:

Basic Options

Parameter	Type	Default	Description
`raw`	boolean	`false`	Extract raw, underlying cell values instead of formatted text
`rawNumbers`	boolean	`false`	Extract raw numeric values instead of formatted numbers
`dateNF`	string	`undefined`	Specific date format string for interpreting dates
`chunkSize`	number	`10000`	Number of records to process in each batch
`parallel`	number	`1`	Number of chunks to process concurrently
`skipEmptyLines`	boolean	`false`	Skip rows that are entirely empty
`debug`	boolean	`false`	Enable verbose logging for troubleshooting

Advanced Options

Parameter	Type	Default	Description
`cascadeRowValues`	boolean	`false`	Fill empty cells with values from the cell above
`cascadeHeaderValues`	boolean	`false`	Fill empty header cells with values from the cell to the left
`headerDetectionOptions`	object	See below	Configure header row detection
`mergedCellOptions`	object	`undefined`	Define how to handle merged cells

Header Detection Options

Default configuration:

{
  algorithm: 'default',
  rowsToSearch: 10
}

Available algorithms:

'default' - Scans first 10 rows and selects the one with most non-empty cells
'explicitHeaders' - Use when headers are explicitly defined
'specificRows' - Define specific row numbers as headers
'dataRowAndSubHeaderDetection' - Advanced detection for complex header structures

Merged Cell Options

Configure treatment of merged cells with these options:

Treatment	Description
`applyToAll`	Copy merged cell value to all cells in the range
`applyToTopLeft`	Keep value only in top-left cell
`coalesce`	Keep first row/column and remove others
`concatenate`	Combine values with a separator

Usage Examples

import { listener } from '@flatfile/listener';
import { ExcelExtractor } from '@flatfile/plugin-xlsx-extractor';

export default function (listener) {
  listener.use(ExcelExtractor());
}

Configuration Example

import { listener } from '@flatfile/listener';
import { ExcelExtractor } from '@flatfile/plugin-xlsx-extractor';

export default function (listener) {
  listener.use(
    ExcelExtractor({
      rawNumbers: true,
      skipEmptyLines: true,
      chunkSize: 50000,
      debug: true,
    })
  );
}

Advanced Header Detection

import { listener } from '@flatfile/listener';
import { ExcelExtractor } from '@flatfile/plugin-xlsx-extractor';

export default function (listener) {
  listener.use(
    ExcelExtractor({
      headerDetectionOptions: {
        algorithm: 'specificRows',
        rowNumbers: [2], // 0-based index for the third row
      },
    })
  );
}

Merged Cell Handling

import { listener } from '@flatfile/listener';
import { ExcelExtractor } from '@flatfile/plugin-xlsx-extractor';

export default function (listener) {
  listener.use(
    ExcelExtractor({
      mergedCellOptions: {
        acrossColumns: {
          treatment: 'concatenate',
          separator: ' ',
        },
        acrossRows: {
          treatment: 'applyToAll',
        },
      },
    })
  );
}

Direct Parser Usage

import * as fs from 'fs';
import { excelParser } from '@flatfile/plugin-xlsx-extractor';

async function parseMyFile(filePath) {
  try {
    const fileBuffer = fs.readFileSync(filePath);
    const workbookData = await excelParser(fileBuffer, {
      skipEmptyLines: true,
    });
    console.log(workbookData['Sheet1'].data);
  } catch (error) {
    console.error('Failed to parse file:', error);
  }
}

Troubleshooting

Large File Handling

import { excelParser } from '@flatfile/plugin-xlsx-extractor';

async function parseLargeFile(buffer) {
  try {
    const workbookData = await excelParser(buffer);
    return workbookData;
  } catch (error) {
    if (error.message === 'plugins.extraction.fileTooLarge') {
      console.error('The file is too large to be processed. Please convert it to CSV first.');
    } else {
      console.error('An unexpected error occurred:', error);
    }
  }
}

Debug Mode

Enable debug mode for detailed logging:

ExcelExtractor({
  debug: true
})

This provides detailed logs about the extraction process, including detected headers, rows processed, and configuration options being applied.

Notes

Default Behavior

Header Detection: By default, the plugin scans the first 10 rows and selects the one with the most non-empty cells as the header row
Empty Rows: Empty rows are included as empty records unless skipEmptyLines is set to true
Merged Cells: Handled by the underlying library’s default behavior unless custom options are provided
Chunk Processing: Data is processed in batches of 10,000 records by default

Important Considerations

Server-Side Only: This plugin is designed to run in a server-side environment and should be used within a Flatfile listener
Duplicate Headers: If a sheet contains duplicate column headers, the plugin automatically makes them unique by appending a suffix (e.g., ‘Name’, ‘Name_1’, ‘Name_2’)
Empty Headers: Empty header cells are renamed to ‘empty’ (e.g., ‘empty’, ‘empty_1’)
Trailing Empty Rows: The parser automatically trims any fully empty rows from the end of a sheet before processing
Memory Limitations: The plugin has built-in handling for extremely large files that can cause memory issues, throwing a user-friendly error when files are too large

Cascading Behavior

Row Cascading: When cascadeRowValues is enabled, empty cells are filled with values from the cell above. The cascade resets on a completely blank row or a new value in the column
Header Cascading: When cascadeHeaderValues is enabled, empty header cells are filled with values from the cell to the left. The cascade resets on a blank column or a new value

Getting Started

Core Concepts

Coding Tutorial

Plugins

Advanced Guides

Embedding Flatfile

Reference

Installation

Configuration & Parameters

Basic Options

Advanced Options

Header Detection Options

Merged Cell Options

Usage Examples

Configuration Example

Advanced Header Detection

Merged Cell Handling

Direct Parser Usage

Troubleshooting

Large File Handling

Debug Mode

Notes

Default Behavior

Important Considerations

Cascading Behavior

Getting Started

Core Concepts

Coding Tutorial

Plugins

Advanced Guides

Embedding Flatfile

Reference

​Installation

​Configuration & Parameters

​Basic Options

​Advanced Options

​Header Detection Options

​Merged Cell Options

​Usage Examples

​Configuration Example

​Advanced Header Detection

​Merged Cell Handling

​Direct Parser Usage

​Troubleshooting

​Large File Handling

​Debug Mode

​Notes

​Default Behavior

​Important Considerations

​Cascading Behavior

Installation

Configuration & Parameters

Basic Options

Advanced Options

Header Detection Options

Merged Cell Options

Usage Examples

Configuration Example

Advanced Header Detection

Merged Cell Handling

Direct Parser Usage

Troubleshooting

Large File Handling

Debug Mode

Notes

Default Behavior

Important Considerations

Cascading Behavior