Skip to main content

Ingesting Data from Files

Learn how to ingest data from file sources like CSV, Excel, and JSON files from SFTP or cloud storage.

Overview

File-based ingestion is essential for working with data exports, spreadsheets, and file drops. In this guide, you'll learn how to:

  • Create connections to file sources (SFTP, ADLS)
  • Configure file ingestion tasks
  • Handle column mapping and schema inference
  • Work with different file formats

Prerequisites

  • An existing Production Line (see Production Lines)
  • Access credentials for your file source (SFTP server, Azure Data Lake, etc.)
  • Understanding of your source file format and structure

Step-by-Step Guide

1. Create a file source connection

  1. Navigate to Build > Connections
  2. Click New Connection
  3. Select your file source type:
    • SFTP for secure file transfer servers
    • Azure Data Lake Storage Gen2 for ADLS
    • Azure Blob Storage for blob containers
  4. Enter your connection details and credentials
  5. Test and save the connection

2. Create a file ingestion task

  1. Open your production line and navigate to the Graph view
  2. Add a new task using one of these methods:
    • Click the + button in the graph side menu
    • Right-click on an existing node and select Add Task from the context menu
  3. Enter a unique Code and Name for your task
  4. Select the appropriate ingestion activity from the Activity dropdown:
    • "Ingest Delimited File to Lakehouse" for CSV files
    • "Ingest Excel Worksheet to Lakehouse" for Excel files
    • "Ingest JSON File to Lakehouse" for JSON files
  5. Configure the task properties

3. Configure file settings

Depending on your file format, you may need to configure:

For Delimited Files (CSV):

  • Column delimiter (comma, tab, pipe, etc.)
  • Quote character
  • Header row settings
  • Encoding

For Excel Files:

  • Worksheet name or index
  • Header row settings
  • Data range

For JSON Files:

  • JSON path expression
  • Array handling

4. Set up column mapping

  1. Review the inferred schema
  2. Adjust column names if needed
  3. Set appropriate data types
  4. Add any computed columns

5. Run and verify

  1. Save your task configuration
  2. Run the ingestion task
  3. Verify the data in your Lakehouse

Key Concepts

TermDefinition
Schema InferenceAutomatic detection of column names and data types from file structure
Column MappingThe process of defining how source columns map to destination columns
Delimited FileA text file where columns are separated by a specific character