Skip to main content

Understanding Your Lakehouse Data

Learn how data is stored in the Lakehouse, including Delta tables, schemas, and data management concepts.

Overview

The Lakehouse is the central data storage in Insight Factory, built on Delta Lake technology. In this guide, you'll learn about:

  • Delta table structure and benefits
  • Schema organisation
  • Data partitioning concepts
  • History and time travel capabilities

Prerequisites

Understanding Delta Tables

What is a Delta Table?

Delta tables are the storage format used in the Insight Factory Lakehouse. They provide:

  • ACID transactions: Reliable data updates with full consistency
  • Schema enforcement: Automatic validation of data structure
  • Time travel: Access historical versions of your data
  • Efficient updates: Support for merge, update, and delete operations

Delta Table Structure

Each Delta table consists of:

  1. Data files: Parquet files containing the actual data
  2. Transaction log: A record of all changes to the table
  3. Metadata: Information about schema, partitioning, and configuration

Schema Organisation

Schemas in the Lakehouse

Data in the Lakehouse is organised into schemas (also called databases). Common patterns include:

  • Raw schema: Landing zone for ingested data
  • Curated schema: Cleaned and transformed data
  • Published schema: Data ready for consumption

Naming Conventions

When configuring ingestion Tasks, you'll specify:

  • Schema name: The logical grouping for your table
  • Table name: The specific table within the schema

Data Partitioning

What is Partitioning?

Partitioning divides large tables into smaller, more manageable chunks based on column values. Benefits include:

  • Faster query performance
  • Efficient data management
  • Reduced scan times

Common Partitioning Strategies

StrategyUse CaseExample
Date-basedTime-series dataPartition by year/month/day
Category-basedSegmented dataPartition by region or product type

Schema Enforcement

How Schema Enforcement Works

When data is written to a Delta table:

  1. The incoming data schema is compared to the table schema
  2. Mismatches are handled according to configuration:
    • Strict mode: Reject data with schema mismatches
    • Merge mode: Add new columns automatically

Column Casting

Insight Factory can automatically cast data types during ingestion:

  • String to numeric conversions
  • Date/time parsing
  • Boolean conversions

Key Concepts

TermDefinition
Delta TableA data storage format providing ACID transactions and schema enforcement
SchemaA logical grouping of related tables
PartitioningDividing data into smaller segments for performance
Time TravelThe ability to query historical versions of data