Enrichment Template
Input Parameters
All Notebook Parameters (if any) are contained in the dictionary variable 'params'. There are two ways to get an individual parameter from params, in both cases the parameter name is case-sensitive:
-
Use dot-notation - refer to the example below:
params = { "Name": "Test", "Values": { "Title": "Results", "Results": [ { "Definition": "Core Sample", "Outcome": "Prospective" }, { "Definition": "Follow-up", "Outcome": "For review" } ] } }
params.Name produces 'Test'
params.Values.Title produces 'Results'
params.Values.Results[0] produces { "Definition": "Core Sample", "Outcome": "Prospective" }
params.Values.Results[1].Definition produces 'Follow-up' -
Use the search_dictionary function as follows:
var1 = search_dictionary(params, "parameter-name")There is an optional third parameter to this function:
value_to_return_if_not_found- this is the value to return if the particular parameter is not found in params.Note that
value_to_return_if_not_foundcan take on any type (string, int, boolean, struct, ...) e.g.,search_dictionary(params, "IncorrectlyNamedParameter", False)will return the boolean False if "IncorrectlyNamedParameter" is not found in params.
CAUTION: There is another dictionary variable, 'config', that contains all of the configuration sent to this notebook. In most cases you will have no use for 'config', but if you choose to use 'config' in this notebook, note the following:
- Access the individual parameters within config by using the
search_dictionaryfunction e.g.,search_dictionary(config, "ParameterName"). Dot-notation access does not apply to 'config'. - Heed this WARNING - The individual parameter names within 'config' are subject to change outside of your control, which may break your code.
Enrichment Results
Add the code you need to perform your enrichment/extract in cell(s) below until the 'Notebook End' cell.
Important:
Ensure that the dataset resulting from executing the enrichment is stored in a Spark DataFrame called 'df_result', e.g.,:
df_result = spark.sql("""
...your SQL Code here
""")
Returning metrics to record against the Task Run
If you would like to return one or more metrics regarding the running of the code in this notebook, simply declare a variable 'run_output' and populate it with a valid JSON string containing your metrics. At the end of the execution of this notebook, the value of run_output will be recorded against the Task Run record. For example, to record the version number of the model that is used to run inference, you might do something like:
run_output = '{ "model_version_number": 5 }'
Running this notebook directly in Databricks
This notebook can be run directly from your Databricks Workspace. If the notebook relies on Notebook Parameters, please read the following instructions:
- Add this line of code to a cell at the top of your notebook and run that cell.
dbutils.widgets.text('ParametersJSON', '{ "NotebookParameters": { "param1": "value1", "param2": "value2" } }') - This will add a parameter to the notebook. Simply replace (or remove) the pre-canned parameters, 'param1', 'param2', and their values with your own.
- When you have finished running this notebook directly in Databricks, comment out the line of code you added or delete the cell entirely.