Skip to main content

Ingest SAP CDC (Full then Incremental Load) as Parquet

Copy data from SAP CDC (Full Load then Incremental Load) to parquet format in Azure Data Lake Storage Gen2.

Category: Ingest to Lakehouse | Tags: Ingestion

How it works

Extract object '<<OdpName>>' against SAP CDC and ingest as parquet into Data Lake location 'raw/<<DataLakeSystemFolder>>/<<DataLakeDatasetFolder>>'

To use this activity within the API, use an ActivityCode of SAP-CDC-FULL-INCREMENTAL-ADLS.

Available Connections

SourceConnection:

TargetConnection:

Example JSON

An example of what the Task Config would look like for a task using this activity. Some of these variables would be set at the group level to avoid duplication between tasks.

{
"SourceConnection": "MY-SOURCE-CONN",
"OdpContext": "ABAP_CDS",
"OdpName": "",
"SapCheckpointKey": "",
"KeyColumns": "",
"DataLakeSystemFolder": "my_folder",
"DataLakeDatasetFolder": "data",
"TargetConnection": "MY-TARGET-CONN",
"DeltaSchemaName": "example_schema",
"DeltaTableName": "my_table"
}

Variable Reference

The following variables are supported:

  • DataLakeDatasetFolder (Required) - Name of the folder in the Data Lake containing the dataset.

  • DataLakeSystemFolder (Required) - Name of the parent (System) folder in the Data Lake containing the dataset.

  • DeltaSchemaName (Optional) - The name of the Schema this transformation lives in.

  • DeltaTableName (Optional) - The name of the Table representing this transformation.

  • ElevateToDelta (Optional) - Ingest directly to Lakehouse Table

  • IsFederated (Optional) - Makes task available to other Insight Factories within this organisation.

  • KeyColumns (Required) - Comma-separated string of the key columns of the dataset to extract.

  • Links (Optional) - NULL

  • MaximumNumberOfAttemptsAllowed (Optional) - The total number of times the running of this Task can be attempted.

  • MinutesToWaitBeforeNextAttempt (Optional) - If a Task run fails, the number of minutes to wait before re-attempting the Task.

  • OdpContext (Required) - The context of the ODP data extraction.

  • OdpName (Required) - The name of the data source object to extract.

  • RetainHistory (Optional) - Should the raw files be saved to the History Container to preserve them?

    Show more details

    **Retain History? ** By default, this flag is set to the value assigned in the Configuration item SaveRawFilesToHistory (signalled by the double triangle brackets around the Configuration item name e.g. &lt;&lt;SaveRawFilesToHistory&gt;&gt;). This default behaviour can be overridden here.

  • SapCheckpointKey (Required) - Checkpoint Key used for this activity's CDC workflow. WARNING - this must be unique for each dataset.

  • SizeOfSparkCompute (Optional) - The size of compute used in the Spark cluster.

  • SourceConnection (Required) - Source connection to use.

  • TargetConnection (Optional) - Target connection to use.