Understanding Your Lakehouse Data
Learn how data is stored in the Lakehouse, including Delta tables, schemas, and data management concepts.
Overview
The Lakehouse is the central data storage in Insight Factory, built on Delta Lake technology. In this guide, you'll learn about:
- Delta table structure and benefits
- Schema organisation
- Data partitioning concepts
- History and time travel capabilities
Prerequisites
- Basic understanding of data ingestion (see Ingesting Data from a Database)
- Familiarity with data storage concepts
Understanding Delta Tables
What is a Delta Table?
Delta tables are the storage format used in the Insight Factory Lakehouse. They provide:
- ACID transactions: Reliable data updates with full consistency
- Schema enforcement: Automatic validation of data structure
- Time travel: Access historical versions of your data
- Efficient updates: Support for merge, update, and delete operations
Delta Table Structure
Each Delta table consists of:
- Data files: Parquet files containing the actual data
- Transaction log: A record of all changes to the table
- Metadata: Information about schema, partitioning, and configuration
Schema Organisation
Schemas in the Lakehouse
Data in the Lakehouse is organised into schemas (also called databases). Common patterns include:
- Raw schema: Landing zone for ingested data
- Curated schema: Cleaned and transformed data
- Published schema: Data ready for consumption
Naming Conventions
When configuring ingestion Tasks, you'll specify:
- Schema name: The logical grouping for your table
- Table name: The specific table within the schema
Data Partitioning
What is Partitioning?
Partitioning divides large tables into smaller, more manageable chunks based on column values. Benefits include:
- Faster query performance
- Efficient data management
- Reduced scan times
Common Partitioning Strategies
| Strategy | Use Case | Example |
|---|---|---|
| Date-based | Time-series data | Partition by year/month/day |
| Category-based | Segmented data | Partition by region or product type |
Schema Enforcement
How Schema Enforcement Works
When data is written to a Delta table:
- The incoming data schema is compared to the table schema
- Mismatches are handled according to configuration:
- Strict mode: Reject data with schema mismatches
- Merge mode: Add new columns automatically
Column Casting
Insight Factory can automatically cast data types during ingestion:
- String to numeric conversions
- Date/time parsing
- Boolean conversions
Key Concepts
| Term | Definition |
|---|---|
| Delta Table | A data storage format providing ACID transactions and schema enforcement |
| Schema | A logical grouping of related tables |
| Partitioning | Dividing data into smaller segments for performance |
| Time Travel | The ability to query historical versions of data |