Ingest SQL Server Database as Parquet
Copy data from SQL Server database to parquet format in Azure Data Lake Storage Gen2.
Category: Ingest to Lakehouse | Tags: Ingestion
Execute query '<<ExtractQuery>>' against SQL Server database '<<DataLakeSystemFolder>>' and ingest as parquet into Data Lake location 'raw/<<DataLakeSystemFolder>>/<<DataLakeDatasetFolder>>'
To use this activity within the API, use an ActivityCode of SQLSERVER-ADLS.
Available Connections
SourceConnection:
TargetConnection:
Example JSON
An example of what the Task Config would look like for a task using this activity. Some of these variables would be set at the group level to avoid duplication between tasks.
{
"SourceConnection": "MY-SOURCE-CONN",
"ExtractQuery": "",
"DataLakeSystemFolder": "my_folder",
"DataLakeDatasetFolder": "data",
"DeltaTableUpdateType": "Replace",
"TargetConnection": "MY-TARGET-CONN",
"DeltaSchemaName": "example_schema",
"DeltaTableName": "my_table"
}
Variable Reference
The following variables are supported:
DataLakeDatasetFolder(Required) - Name of the folder in the Data Lake containing the dataset.DataLakeSystemFolder(Required) - Name of the parent (System) folder in the Data Lake containing the dataset.DeltaSchemaName(Optional) - The name of the Schema this transformation lives in.DeltaTableBusinessKeyColumnList(Optional) - Comma-separated list of Business Key columns in the Lakehouse Table. This is required if 'Lakehouse Table Update Type' is 'Dimension' or 'Merge'. If a value is specified, a uniqueness test is performed against this (composite) key for both the result of the Enrichment and the Lakehouse Table.DeltaTableName(Optional) - The name of the Table representing this transformation.DeltaTableUpdateType(Required) - Indicates what type of update (if any) is to be performed on the Lakehouse Table.DIUsToUseForCopyActivity(Optional) - Specifies the powerfulness of the copy executor. Value can be between 2 and 256. When left at default, the Data Factory dynamically applies the optimal DIU setting based on the source-sink pair and data pattern.ElevateToDelta(Optional) - Ingest directly to Lakehouse TableExtractControlQuery(Optional) - For incremental loads only, a SQL query to get a 'high-water' mark for extract control. For instance, this could be the maximum value of an modified_date or an identity column. NOTE: The column returned must be aliased as ExtractControl e.g. select max(modified_date) as ExtractControl from some_table.Show more details
**Extract Control Query **
For incremental data extraction, the SQL statement that will be executed on the source database to extract the high-water mark for this data prior to extracting. This could be the maximum value of a modified date or an identity column and will be used in the next data extraction.
NOTE: The column returned must be aliased as ExtractControl. For example:
select max(modified_date) as ExtractControl
from some_table;ExtractControlVariableName(Optional) - For incremental loads only, the name to assign the Extract Control variable in State Config for the ExtractControl value derived from the Extract Control Query above.ExtractControlVariableSeedValue(Optional) - The initial value to set for the Extract Control variable in State Config - this will have no impact beyond the original seeding of the Extract Control variable in State Config.ExtractQuery(Required) - SQL query to extract data from the source database.IsFederated(Optional) - Makes task available to other Insight Factories within this organisation.Links(Optional) - NULLMaximumNumberOfAttemptsAllowed(Optional) - The total number of times the running of this Task can be attempted.MinutesToWaitBeforeNextAttempt(Optional) - If a Task run fails, the number of minutes to wait before re-attempting the Task.RetainHistory(Optional) - Should the raw files be saved to the History Container to preserve them?Show more details
**Retain History? ** By default, this flag is set to the value assigned in the Configuration item SaveRawFilesToHistory (signalled by the double triangle brackets around the Configuration item name e.g. <<SaveRawFilesToHistory>>). This default behaviour can be overridden here.
SourceConnection(Required) - Source connection to use.TargetConnection(Optional) - Target connection to use.