Custom Ingest, Parse, and Post Output can all be configured to run on the user's Custom created notebooks by using the Cluster Configuration page. In this example, we will create a new Cluster Config pointing at our Custom Code and update a Custom Ingest Source to use it. Custom Parse and Custom Post Output setups will follow nearly identical steps.
Finding the Notebook Name
Before we create any Cluster Configurations, it is important to first find the name of the Notebook that is running the Custom code. In the Databricks UI, open the Notebook with the Custom code and hover over the Notebook name in the top left corner. A display will show the full name of the Notebook. Highlight the path and copy/record it. Be sure to include the leading slash!
Displaying the Notebook Name
Creating the Cluster Configuration
Navigate to the Cluster Configuration page in DataForge and Click the New Cluster button. A Cluster Settings Page will open. Locate the Job Task Type button and click "Custom Notebook". This will cause a Notebook Path control to appear. The user should put the value recorded in the step above into the Notebook Path control. See the image below for an example.
The Cluster Configuration points toward the Custom Notebook
Applying the Cluster Configuration to a Custom Ingestion Source
Now navigate to the Source Settings page for the Custom Ingest Source associated with the Notebook. Locate the "Custom Ingest Cluster Configuration" dropdown and select your newly created Cluster Configuration. Hit save. The Source is now configured to use the specified Notebook when running ingestion processes! Try clicking the Pull Now button or setting a schedule on the Source to pull in data.
The Custom Ingest Cluster Config applied to a Source
That's it. The Source is now fully configured to use the Custom Notebook when pulling in data. The process is similar for Custom Parsing, with the user editing the "Custom Parsing Cluster Configuration" control on the Source Settings page. For Custom Post Output, the user will edit the "Custom Post Output Cluster Config" control on the Output Settings page.
Updated