Setting Up DataForge Compute Configuration with Databricks Custom Notebook

Ingestion, Parse, and Post Output can all be configured to run on your custom created notebooks by using the Compute configuration page. 
 

Finding the Notebook Path

Before you create any Compute Configurations, it is important to first find the path of the Notebook that is running the custom code. In the Databricks UI, open the Notebook and hover over the Notebook name. A display will show the full path of the Notebook. Highlight the path and copy it.  Be sure to include the leading slash!
 

 

Creating the Compute Configuration

Navigate to the Compute page in DataForge and click the New + button. A Compute Settings Page will open. Locate the Job Task Type button and click "Custom Notebook". This will cause a Notebook Path setting to appear where you need to paste in the notebook path copied in the step above. See the image below for an example. If other libraries/packages are used other than the DataForge SDK, enter the library/package in the Parameters > Job Configuration > Libraries.
 
The Compute Configuration points toward the Custom Notebook
 

Applying the Compute Configuration 

Once the Compute Configuration is created, you will attach it to a Source (Ingest/Parse) or Output (Post Output) depending on what the notebook is intended for.

Attaching to a Source for Custom Ingestion

  1. Navigate to the Source Settings page where the Custom Ingestion will be used, or create a new Source. 
  2. Change the Connection Type to "Custom". 
  3. Locate the "Custom Ingest Compute Config" dropdown and select your newly created Compute Configuration. 
  4. Click save. 
The Custom Ingest Compute Config applied to a Source
 
The Source is now configured to use the specified Notebook when running ingestion processes. Try clicking the Pull Now button or setting a schedule on the Source to pull in data to test your custom code.
 

Attaching to a Source for Custom Parse

  1. Navigate to the Source Settings page where the Custom Parse will be used, or create a new Source. 
  2. Change the Connection Type to "File" (custom parse only runs for file based sources)
  3. Change the Parser option to "Custom" 
  4. Locate the "Custom Parse Compute Config" dropdown and select your newly created Compute Configuration. 
  5. Click save. 

The Source is now configured to use the specified Notebook when running parse processes. Try clicking the Pull Now button or resetting parse for an input in your source to test your custom code.

Attaching to an Output for Custom Post Output

  1. Navigate to the Output Settings page where the Custom Post Output will be used, or create a new Output. 
  2. Locate the Post Output Commands option and select "Custom Notebook". 
  3. Locate the "Custom Post Output Compute Config" dropdown and select your newly created Compute Configuration. 
  4. Click save. 
The Custom Ingest Compute Config applied to a Source
 
The Output is now configured to run the specified Notebook after each Output process completes. Try clicking the Reset Output option in the Mapping page, or resetting output for a specific channel to test your custom code.
 

Updated

Was this article helpful?

0 out of 0 found this helpful