Ingesting Data from a Database
Learn how to ingest data from relational databases like SQL Server or Azure SQL Database into your Lakehouse.
Overview
Database ingestion is one of the most common data pipeline patterns. In this guide, you'll learn how to:
- Create a database connection
- Configure a task with an ingest activity
- Run the ingestion and verify data landed correctly
Prerequisites
- An existing Production Line (see Production Lines)
- Database credentials or connection details
- Network connectivity to your source database
Step-by-Step Guide
1. Create a database connection
Before you can ingest data, you need to create a connection to your source database:
- Navigate to Build > Connections
- Click New Connection
- Select the appropriate connection type (e.g., Azure SQL Database, SQL Server)
- Enter your connection details:
- Server name
- Database name
- Authentication method
- Credentials
- Test the connection to ensure it works
- Save the connection
2. Create an ingestion task
- Open your production line and navigate to the Graph view
- Add a new task using one of these methods:
- Click the + button in the graph side menu
- Right-click on an existing node and select Add Task from the context menu
- Enter a unique Code and Name for your task
- Select an ingestion activity from the Activity dropdown (e.g., "Ingest from Azure SQL Database to Lakehouse")
- Configure the task properties:
- Select your connection
- Choose the source table or query
- Configure the destination schema and table name
- Set any column mappings if needed
3. Run the ingestion
- Save your task configuration
- Click Run to execute the ingestion
- Monitor the progress in the task details
4. Verify your data
After the task completes:
- Check the task status shows success
- Navigate to your Lakehouse to verify the data landed correctly
- Review row counts and sample data
Key Concepts
| Term | Definition |
|---|---|
| Connection | A saved configuration for connecting to an external data source |
| Ingestion | The process of extracting data from a source and loading it into the Lakehouse |
| Lakehouse | The Delta Lake-based data storage in Insight Factory |