Incremental Loading Patterns

Learn how to implement incremental loading to efficiently process only new or changed data.

Overview

Incremental loading processes only new or modified data since the last run, significantly improving performance and reducing resource usage. In this guide, you'll learn about:

Extract control queries and high-water marks
Different delta update types
Choosing the right incremental strategy
Handling edge cases

Prerequisites

Understanding of data ingestion (see Ingesting Data from a Database)
Knowledge of your source data's change tracking mechanism
Familiarity with task configuration

Understanding Incremental Loading

Why use incremental loading?

Full Load	Incremental Load
Extracts all data every time	Extracts only new/changed data
Simple but slow for large tables	Faster and more efficient
High resource usage	Lower resource usage
Can miss deletes	Requires change tracking

Prerequisites for incremental loading

Your source data must have a way to identify changes:

Timestamp column: LastModifiedDate, UpdatedAt
Sequence/ID column: Auto-incrementing ID
Change tracking: Native CDC or change tracking

Step-by-Step Guide

1. Configure extract control

Extract control queries allow Insight Factory to track what data has been processed:

Open your ingestion task
Navigate to Extract Control settings
Configure:
- Control column (e.g., LastModifiedDate)
- Control variable name
- Initial value for first run

2. Understanding high-water marks

The high-water mark is the maximum value processed in the last run:

-- Example: Extract control query
SELECT * FROM source_table
WHERE LastModifiedDate > @LastLoadDate

After each run, Insight Factory stores the new high-water mark for the next execution.

3. Delta update types

Configure how data is written to the destination:

Update Type	Behaviour	Use Case
Append	Add new rows only	Log/event data
Merge	Insert new, update existing	Dimension tables
Replace	Replace matching partitions	Partitioned fact tables
Dimension (SCD2)	Track historical changes	Slowly changing dimensions

4. Configure merge settings

For merge operations, configure:

Business key columns: Columns that identify unique records
Compare columns: Columns to check for changes (optional)
Update behaviour: What happens when records match

5. Handle initial load

The first run needs special handling:

Set an appropriate initial high-water mark value
Consider running initial load during off-peak hours
Monitor the first execution for performance

Advanced patterns

Handling deletes

Soft deletes (recommended):

Source has an IsDeleted flag
Include deleted records in extract
Mark as deleted in destination

Hard deletes (complex):

Requires full comparison periodically
Or use CDC/change tracking at source

Partitioned incremental loading

For very large tables:

Partition by date or category
Only process recently modified partitions
Combine with high-water mark for efficiency

Multiple control columns

Some scenarios need multiple controls:

-- Example: Multiple control columns
WHERE ModifiedDate > @LastModifiedDate
  OR (ModifiedDate = @LastModifiedDate AND ID > @LastID)

Troubleshooting

Issue	Possible Cause	Solution
Missing records	Timestamp precision	Use higher precision timestamp
Duplicate records	Non-unique high-water mark	Add secondary sort column
Slow performance	Too much history	Adjust initial high-water mark
Data gaps	Source updates after extract	Add buffer time to query

Key Concepts

Term	Definition
High-Water Mark	The maximum value of the control column from the last run
Extract Control	Configuration for tracking incremental progress
Delta Update	Processing only changed data instead of all data
Business Key	Column(s) that uniquely identify a record for merge operations

Overview​

Prerequisites​

Understanding Incremental Loading​

Why use incremental loading?​

Prerequisites for incremental loading​

Step-by-Step Guide​

1. Configure extract control​

2. Understanding high-water marks​

3. Delta update types​

4. Configure merge settings​

5. Handle initial load​

Advanced patterns​

Handling deletes​

Partitioned incremental loading​

Multiple control columns​

Troubleshooting​

Key Concepts​

Related Guides​