Skip to main content

Export Lakehouse Query Results to CSV/Parquet/JSON File in SFTP (via SSH Pu

Export data from Lakehouse to a CSV File in SFTP endpoint (accessed via SSH Public Key).

Category: Export Data from Lakehouse | Tags: Export

How it works

Execute Query '<<DeltaQuery>>' against the Delta Lake and copy results as a csv file to SFTP endpoint

To use this activity within the API, use an ActivityCode of DELTA-QUERY-FILE-SFTP-SSHPUBLI.

Available Connections

SourceConnection:

TargetConnection:

Example JSON

An example of what the Task Config would look like for a task using this activity. Some of these variables would be set at the group level to avoid duplication between tasks.

{
"SourceConnection": "MY-SOURCE-CONN",
"TargetConnection": "MY-TARGET-CONN",
"TargetFilename": ""
}

Variable Reference

The following variables are supported:

  • DatabricksClusterId (Optional) - The Databricks Cluster to use for this task.

  • DeltaQuery (Optional) - Query to run against the Lakehouse.

  • DIUsToUseForCopyActivity (Optional) - Specifies the powerfulness of the copy executor. Value can be between 2 and 256. When left at default, the Data Factory dynamically applies the optimal DIU setting based on the source-sink pair and data pattern.

  • ExportFileFormat (Required) - The format of the file being exported.

  • ExtractControlQuery (Optional) - For incremental loads only, a SQL query to get a 'high-water' mark for extract control. For instance, this could be the maximum value of an modified_date or an identity column. NOTE: The column returned must be aliased as ExtractControl e.g. select max(modified_date) as ExtractControl from some_table.

    Show more details

    **Extract Control Query **

    For incremental data extraction, the SQL statement that will be executed on the source database to extract the high-water mark for this data prior to extracting. This could be the maximum value of a modified date or an identity column and will be used in the next data extraction.

    NOTE: The column returned must be aliased as ExtractControl. For example:

    select max(modified_date) as ExtractControl
    from some_table;
  • ExtractControlVariableName (Optional) - For incremental loads only, the name to assign the Extract Control variable in State Config for the ExtractControl value derived from the Extract Control Query above.

  • ExtractControlVariableSeedValue (Optional) - The initial value to set for the Extract Control variable in State Config - this will have no impact beyond the original seeding of the Extract Control variable in State Config.

  • IsFederated (Optional) - Makes task available to other Insight Factories within this organisation.

  • Links (Optional) - NULL

  • MaximumNumberOfAttemptsAllowed (Optional) - The total number of times the running of this Task can be attempted.

  • MinutesToWaitBeforeNextAttempt (Optional) - If a Task run fails, the number of minutes to wait before re-attempting the Task.

  • SaveExportToHistory (Optional) - Save the exported data to history in the Lakehouse?

  • SourceConnection (Required) - Source connection to use.

  • TargetConnection (Required) - Target connection to use.

  • TargetFilename (Required) - The filename to save the exported results as.

  • TargetRelativePath (Optional) - The relative path to store the csv file under.