Skip to main content

Ingest Common Data Model (via Manifest) as Parquet

Copy data from Azure Data Lake Storage Gen2 in Common Data Model (CDM) format, using a Manifest, to Parquet format in Azure Data Lake Storage Gen2.

Category: Ingest to Lakehouse | Tags: Ingestion

To use this activity within the API, use an ActivityCode of ADLS-CDM-MANIFEST-ADLS.

Available Connections

SourceConnection:

TargetConnection:

Example JSON

An example of what the Task Config would look like for a task using this activity. Some of these variables would be set at the group level to avoid duplication between tasks.

{
"SourceConnection": "MY-SOURCE-CONN",
"CDMContainer": "",
"CDMManifestFolderPath": "",
"CDMManifestFileName": "",
"CDMEntity": "",
"DataLakeSystemFolder": "my_folder",
"DataLakeDatasetFolder": "data",
"TargetConnection": "MY-TARGET-CONN",
"DeltaSchemaName": "example_schema",
"DeltaTableName": "my_table"
}

Variable Reference

The following variables are supported:

  • CDMContainer (Required) - Container name of the CDM folder.

  • CDMEntity (Required) - Name of the entity defined in the Model.json file or manifest.

  • CDMFolderPath (Optional) - Root folder location of CDM folder.

  • CDMManifestFileName (Required) - Name of the manifest file. Default value is 'default'.

  • CDMManifestFolderPath (Required) - Folder path of the entity within the root folder.

  • DataLakeDatasetFolder (Required) - Name of the folder in the Data Lake containing the dataset.

  • DataLakeSystemFolder (Required) - Name of the parent (System) folder in the Data Lake containing the dataset.

  • DeltaSchemaName (Optional) - The name of the Schema this transformation lives in.

  • DeltaTableName (Optional) - The name of the Table representing this transformation.

  • DIUsToUseForCopyActivity (Optional) - Specifies the powerfulness of the copy executor. Value can be between 2 and 256. When left at default, the Data Factory dynamically applies the optimal DIU setting based on the source-sink pair and data pattern.

  • ElevateToDelta (Optional) - Ingest directly to Lakehouse Table

  • IsFederated (Optional) - Makes task available to other Insight Factories within this organisation.

  • Links (Optional) - NULL

  • MaximumNumberOfAttemptsAllowed (Optional) - The total number of times the running of this Task can be attempted.

  • MinutesToWaitBeforeNextAttempt (Optional) - If a Task run fails, the number of minutes to wait before re-attempting the Task.

  • RetainHistory (Optional) - Should the raw files be saved to the History Container to preserve them?

    Show more details

    **Retain History? ** By default, this flag is set to the value assigned in the Configuration item SaveRawFilesToHistory (signalled by the double triangle brackets around the Configuration item name e.g. <<SaveRawFilesToHistory>>). This default behaviour can be overridden here.

  • SourceConnection (Required) - Source connection to use.

  • TargetConnection (Optional) - Target connection to use.