Requirements
The new version of DataForge requires that you enable Unity Catalog in your Databricks workspace before the upgrade. If Unity Catalog is not enabled and the upgrade is completed, job runs will fail with an error that the metastore does not exist. Follow the steps below to setup Unity Catalog if it is not already enabled and assigned to your workspace.
Azure:
- Open the Databricks workspace and select your account name in the top-right corner. Select Manage Account and login as the Account Admin.
- Select the Data tab on the left-hand menu.
- Click the Create Metastore button to start a new metastore and use the following guidance:
- Name: choose your name for the metastore
- Region: Select the Region that your DataForge/Databricks workspace is hosted in. If you are unsure, look in the Resource Group in Azure to identify the region.
- ADLS Gen 2 path: leave blank unless you have a use case for a centrally managed metastore location. If you have a need for this, please contact DataForge support.
- Access Connector id: leave blank
- Click Create to create the metastore
- On the next page, check the box next to your Databricks workspace that DataForge is tied to and assign the workspace to the metastore. Click Assign button to finalize.
AWS:
- Open the Databricks workspace and select your account name in the top-right corner. Select Manage Account and login as the Account Admin.
- Select the Data tab on the left-hand menu.
- Click the Create Metastore button to start a new metastore and use the following guidance:
- Name: choose your name for the metastore
- Region: Select the Region that your DataForge/Databricks workspace is hosted in. If you are unsure, open the Workspaces option in the left menu to see which region the workspace is in.
- S3 Bucket Path: leave blank
- IAM role ARN: leave blank
- Click Create to create the metastore
- On the next page, check the box next to your Databricks workspace that DataForge is tied to and assign the workspace to the metastore. Click Assign button to finalize.
Pre-Upgrade Process
DataForge
Open all Agents from the Main Menu -> Agents and enable the AutoUpdate on each agent. Save the change.
Terraform
1) Add a Terraform variable for "use_legacy_hub_database" and set the value to "yes"
2) Update Terraform variable "manualUpgradeVersion" and set the value to "7.1.x"
3) Find the following information in Auth0. Log in to Auth0 and navigate to the Application page. Open the "API Explorer Application" to find the following values.
- Domain
- Client ID
- Client Secret
Add the following Terraform variables using all capital letters and set the variable as an Environment variable rather than a Terraform variable. There are existing variables with similar names that need to be left alone and will be removed at a later date.
Variable Name | Variable Value | Variable Category |
AUTH0_CLIENT_ID | Client ID from Auth0 App | Environment |
AUTH0_CLIENT_SECRET | Client Secret from Auth0 App | Environment |
AUTH0_DOMAIN | Domain value from Auth0 App | Environment |
Upgrade Process
After adding the four variables and values to Terraform, follow the standard upgrade guide for your cloud provider to complete the minor upgrade. Once the upgrade is complete, proceed to the post-upgrade steps below.
Post-Upgrade Steps
DataForge/Databricks
Open your DataForge environment and confirm the UI is up and running. Be sure to hard refresh the browser using Ctrl + F5 to avoid out-of-date cached versions of the UI.
Open your Databricks environment, open and run the following notebook.
Workspace -> DataForge-managed -> "7.1-post-upgrade-enr-relation-check"
If the notebook returns any results, you need to resave or fix the relations and/or rules and rule templates that are listed. If there are many results returned, it may be easier to replace "rtdf.show(1000,100)" in the notebook with display(rtdf) and rerun the notebook.
Submit a support request if assistance is needed or you have questions.
Databricks
Search for all notebooks that reference "import com.wmp.intellio.dataops.sdk" and replace these references with "import com.dataforgelabs.sdk". The previous SDK is no longer supported and will potentially cause issues on the latest version. Replace these references with the current SDK.
Updated