Create/Update a Feature Store Table

Run Enrichment Notebook that will create/update a Lakehouse Table.

Category: Machine Learning | Tags: ML

How it works

Run Databricks Notebook '<<NotebookPath>>' and save results to Delta Table '<<DeltaSchemaName>>.<<DeltaTableName>>' which will appear as a Feature Table in the Feature Store

To use this activity within the API, use an ActivityCode of ML-FEATURE-STORE-TABLE.

Example JSON

An example of what the Task Config would look like for a task using this activity. Some of these variables would be set at the group level to avoid duplication between tasks.

{
  "NotebookPath": "/Users/fred.nurks@example.com/MyRepo/My Notebook",
  "DeltaSchemaName": "example_schema",
  "DeltaTableName": "my_table",
  "DeltaTableUpdateType": "Replace",
  "DeltaTablePrimaryKeyColumnList": "",
  "NotebookParameters": { "Param1": "Value1", "Param2": "Value2" }
}

Variable Reference

The following variables are supported:

AdditionalNotebooks (Optional) - The path to other notebooks, Python files etc., referenced by the main notebook.
DatabricksClusterId (Optional) - The Databricks Cluster to use for this task.
DeltaSchemaName (Required) - The name of the Schema this transformation lives in.
DeltaTableComments (Optional) - Comments to add to the Lakehouse Table.
DeltaTableName (Required) - The name of the Table representing this transformation.
DeltaTablePartitionColumnList (Optional) - Comma-separated ordered list of columns forming the Partitioning strategy of the Lakehouse Table.
DeltaTablePrimaryKeyColumnList (Required) - Comma-separated list of Primary Key columns in the Lakehouse Table. NOTE: Column names are case-sensitive.
Show more details
Lakehouse Table Primary Key Column List This key is not enforced but all columns participating in the Primary Key cannot be null.
NOTE: column names are case-sensitive.
For a Timeseries Feature Table, add TIMESERIES to the end of one of your primary key columns. For exammple:
col1 TIMESERIES, col2
DeltaTableUpdateType (Required) - Indicates what type of update (if any) is to be performed on the Lakehouse Table.
ExtractControlVariableName (Optional) - For incremental loads only, the name to assign the Extract Control variable in State Config for the ExtractControl value derived from the Extract Control Query above.
ExtractControlVariableSeedValue (Optional) - The initial value to set for the Extract Control variable in State Config - this will have no impact beyond the original seeding of the Extract Control variable in State Config.
IsFederated (Optional) - Makes task available to other Insight Factories within this organisation.
Links (Optional) - NULL
MaximumNumberOfAttemptsAllowed (Optional) - The total number of times the running of this Task can be attempted.
MinutesToWaitBeforeNextAttempt (Optional) - If a Task run fails, the number of minutes to wait before re-attempting the Task.
NotebookParameters (Optional) - Parameters for use in the Databricks Notebook. This is JSON format e.g. { "Param1": "Value1", "Param2": "Value2" }.
NotebookPath (Required) - The relative path to the Databricks Notebook.
PartitionDepthToReplace (Optional) - The number of columns in 'Lakehouse Table Partition Column List' (counting from the first column in order) to use in a Partition Replacement. NOTE: This cannot be greater than the number of columns defined in the 'Lakehouse Table Partition Column List'. Defaults to 1 if only one column has been specified in 'Lakehouse Table Partition Column List'.
SkipCreateVolumeAndSchema (Optional) - If a Schema and/or Volume has already been created, you can opt to skip this check - it will lead to better performance.

Example JSON​

Variable Reference​

Example JSON

Variable Reference