Run Enrichment Notebook to update Lakehouse Table
Run Enrichment Notebook that will create/update a Lakehouse Table.
Category: Enrich Lakehouse Table | Tags: Enrichment
Run Databricks Notebook '<<NotebookPath>>' and save results to Delta Table '<<DeltaSchemaName>>.<<DeltaTableName>>' (using Update Type '<<DeltaTableUpdateType>>')
To use this activity within the API, use an ActivityCode of ENRICH-DELTA.
Example JSON
An example of what the Task Config would look like for a task using this activity. Some of these variables would be set at the group level to avoid duplication between tasks.
{
"NotebookPath": "/Users/fred.nurks@example.com/MyRepo/My Notebook",
"DeltaSchemaName": "example_schema",
"DeltaTableName": "my_table",
"DeltaTableUpdateType": "Replace",
"NotebookParameters": { "Param1": "Value1", "Param2": "Value2" }
}
Variable Reference
The following variables are supported:
AdditionalNotebooks(Optional) - The path to other notebooks, Python files etc., referenced by the main notebook.ColumnsToExcludeFromRowChangedDetermination(Optional) - For MERGE and DIMENSION updates only. Comma-separated list of columns in the source data that are to be excluded from any 'row-is-changed' determination for merge-style updates. Note that load_date_utc is automatically appended to this list by default.DatabricksClusterId(Optional) - The Databricks Cluster to use for this task.DeltaSchemaName(Required) - The name of the Schema this transformation lives in.DeltaTableBusinessKeyColumnList(Optional) - Comma-separated list of Business Key columns in the Lakehouse Table. This is required if 'Lakehouse Table Update Type' is 'Dimension' or 'Merge'. If a value is specified, a uniqueness test is performed against this (composite) key for both the result of the Enrichment and the Lakehouse Table.DeltaTableComments(Optional) - Comments to add to the Lakehouse Table.DeltaTableName(Required) - The name of the Table representing this transformation.DeltaTablePartitionColumnList(Optional) - Comma-separated ordered list of columns forming the Partitioning strategy of the Lakehouse Table.DeltaTableUpdateType(Required) - Indicates what type of update (if any) is to be performed on the Lakehouse Table.ExtractControlVariableName(Optional) - For incremental loads only, the name to assign the Extract Control variable in State Config for the ExtractControl value derived from the Extract Control Query above.ExtractControlVariableSeedValue(Optional) - The initial value to set for the Extract Control variable in State Config - this will have no impact beyond the original seeding of the Extract Control variable in State Config.InsertUnknownRecord(Optional) - For DIMENSION updates only. When True, a record representing 'Unknown' will be added to the table (if it does not alreay exist).IsFederated(Optional) - Makes task available to other Insight Factories within this organisation.IsSourceDataIncremental(Optional) - For MERGE and DIMENSION updates only. The souce data is incremental if it is not a full dataset. This setting will impact how the MERGE and DIMENSION updates work - if this setting is False, MERGE will delete a row in the target where it is not in the source; DIMENSION will end-date the current record in the target where it is not in the source.Links(Optional) - NULLMaximumNumberOfAttemptsAllowed(Optional) - The total number of times the running of this Task can be attempted.MinutesToWaitBeforeNextAttempt(Optional) - If a Task run fails, the number of minutes to wait before re-attempting the Task.NotebookParameters(Optional) - Parameters for use in the Databricks Notebook. This is JSON format e.g. { "Param1": "Value1", "Param2": "Value2" }.NotebookPath(Required) - The relative path to the Databricks Notebook.PartitionDepthToReplace(Optional) - The number of columns in 'Lakehouse Table Partition Column List' (counting from the first column in order) to use in a Partition Replacement. NOTE: This cannot be greater than the number of columns defined in the 'Lakehouse Table Partition Column List'. Defaults to 1 if only one column has been specified in 'Lakehouse Table Partition Column List'.