This guide will provide steps to migrate Databricks Interactive Clusters and Custom Notebook package imports to use the new DataForge SDK. The existing SDK jar and imports will no longer be supported after November 30th, 2023.
Any cluster configurations configured directly in the DataForge platform will not require changes made to the SDK. The custom notebooks referenced in the custom cluster configurations will need to have the import statements in the notebook code updated.
Migrating SDK Jar on Interactive Clusters
Open the Databricks Workspace and navigate to the Compute page where you see all clusters listed. If clusters are not displayed, use the filters at the top to show all clusters.
Open each cluster by clicking the cluster name, and then select the Libraries tab within the cluster.
If the DataOps SDK jar is installed on the cluster, it will need to be replaced. Start by selecting the Install New button to install the new SDK to the cluster.
Select the "DBFS/S3" toggle and paste in the dbfs mount path below in the File Path setting.
The file path, regardless of whether you are using Azure or AWS, will be the following:
dbfs:/mnt/processing-jars/dataforge-sdk/dataforge-sdk-$version.jar
Replace the $version portion with the version number that matches the version your environment is currently upgraded to.
Example: dbfs:/mnt/processing-jars/dataforge-sdk/dataforge-sdk-7.0.4.jar
Click the install button and start the cluster if not already running to confirm the library installs successfully. A green checkmark will appear in the status column on the library row if everything worked.
Migrating Import Package statements on Notebooks
Open Databricks and use the search bar at the top of the page to search all notebooks for "com.wmp.intellio".
Open each notebook and replace all references within the notebook logic of "com.wmp.intellio.dataops" to instead use "com.dataforgelabs". Everything after these portions of the import can be left as is. Use Control + F to use the browser search to quickly find references that need to be replaced if helpful.
Before changing import statement
After changing import statement
After changing all cluster SDK libraries and notebook import statements, it's recommended to test ingestions and notebooks to be sure everything is working correctly.
Updated