DataForge 6.0 Release and Upgrade Process
The DataForge team is proud to release the 6.0 major version upgrade. In order to upgrade to 6.0.0, you must first have your environment upgraded to 5.2.0 to make the jump to 6.0.
NOTE: The imageVersion variable in Terraform needs to be set to 6.0.2 to get the latest patched version of 6.0.
Examples:
Current Environment 2.5.0 -> 5.1.0 -> 5.2.0 -> 6.0.2
Current Environment 5.1.6 -> 5.2.0 -> 6.0.2
AWS: An AppStream 2.0 stack has been added to replace Query Editor in RDS. If AppStream 2.0 has never been accessed in the AWS account before, go to the AppStream 2.0 console and click "Get Started" before running the Terraform apply. This will create Amazon service roles for AppStream in the account which will be necessary for the apply to run successfully.
For the 6.0 major upgrade, follow the same upgrade process that has been done for previous major/minor releases (links to guides below). After completing the upgrade, continue with the manual steps below to update any custom Clusters and Cluster Pools to 9.1 LTS as well as update the Meta Monitor table to Postgres 14.
NOTE: If you are running DataForge in Azure, the last step will be to open each cluster configuration in your DataForge environment and resave the cluster as is.
AWS - New Version Upgrade Process
Azure - New Version Upgrade Process
End to End Process Overview:
- Follow upgrade process steps in links above
- Manually update Spark Version in DataForge Custom Cluster Configurations
- Manually update Instance Pool ID in DataForge Cluster Configurations with Custom Pool
- (Azure only) Resave each Cluster Configuration in DataForge one time
- Create New Databricks notebook to Update Meta Monitor Table to PG 14
Known Deployment Issues
In Azure environments, the Azure Active Directory Terraform provider was upgraded and may no longer work with the deprecated Microsoft Graph API's on the App Registration used to authenticate Terraform with the Azure Subscription.
Example error: ApplicationsClient.BaseClient.Get(): unexpected status 403 with OData error: Authorization_RequestDenied: Insufficient privileges to complete the operation
If this error is shown in the Terraform plan, please go to the App Registration and add the following Graph API permissions.
Databricks Runtime Version Change
As you may have seen, Databricks is ending support for 7.3 LTS run time. In order to avoid production issues, we are adding support for 9.1 LTS as part of the DataForge 6.0 version release.
In addition to following the normal upgrade process (AWS or Azure), you will need to take manual steps to change any custom Cluster Configurations and Pools from 7.3 LTS to 9.1 LTS. These manual changes need to be made directly after doing the upgrade to avoid job failures. Default configs/pools will auto-upgrade to 9.1 LTS as part of the 6.0 deployment.
Keep in mind that any jobs running with 7.3 LTS in a 6.0 DataForge environment will fail and inversely any jobs running with 9.1 LTS in a <6.0 DataForge environment will fail. We've included the below matrix to help illustrate this point.
My DataForge version is: | |||||
5.2.x - AWS | 5.2.x - Azure | 6.0.x - AWS | 6.0.x - Azure | ||
My Cluster Config Spark Version is: | 7.3 | Fully Supported | Fully Supported |
Azure File Output Broken SQL Server Output Broken |
Azure File Output Broken SQL Server Output Broken |
9.1 |
Azure File Output Broken SQL Server Output Broken |
Azure File Output Broken SQL Server Output Broken |
Fully Supported | Fully Supported |
Note: Any cluster config running 7.3 or using a 7.3 pool will have its name updated when deploying 6.0 to make it easy to find. The name will look similar to below.
CHECK INSTANCE POOL DATABRICKS VERSION: OrigName CHECK DATABRICKS VERSION: OrigName
Updating Custom Cluster Configurations in DataForge
Open your DataForge environment and navigate to the Cluster Configurations. After opening a cluster configuration that needs to be updated, navigate to the Parameters -> Cluster Configuration setting and change the Spark Version drop-down to 9.1. Below are screenshot examples of before and after the change. Once you are finished, save the change and move on to the next cluster configuration.
Before
After
Updating Cluster Pools in Databricks and Attaching in DataForge
Unfortunately there is no way to update the Cluster Pool Version on an existing Pool in Databricks at this time so you will need to create a new Pool and re-associate your cluster configurations with this pool. Below is a screenshot of where to set the Runtime version in Databricks to 9.1 LTS when you make a new Pool.
After creating the new Pool, you will want to copy the instance pool id from the URL and add this to any Cluster Configurations in DataForge still using the old pool. The instance pool id will be the last portion of the URL when you have the Pool opened in Databricks as shown below.
Optionally, you can also find this pool id by opening the pool, navigate to Configuration tab and select the Tags option below to see the DatabricksInstancePoolId value.
Return to your DataForge environment and navigate the menu to System Configuration -> Cluster Configurations and open the cluster configuration that needs to be updated. In the Parameters -> Cluster Configuration, you will paste the new pool id in to the free-form text field "Instance Pool ID" and save your change. Repeat this step for each cluster configuration that needs updating to the new pool.
As a reminder, if you are running DataForge in an Azure environment, please be sure to open and save each of your Cluster Configurations (Menu -> System Configuration -> Cluster Configurations) again as the last step in the deployment.
Update Meta Monitor Table to PG 14
Open Databricks and create a new Notebook. In the notebook cell contents, paste the following code and then run the cell.
val pgConnectionStringRead = dbutils.secrets.get("sparky", "pg_read")
val pgQuery = spark.read.format("jdbc")
.option("url",pgConnectionStringRead)
.option("query", "select value from meta.system_configuration where name='meta-monitor-refresh-query'")
.load().head.getString(0)
spark.sql("DROP TABLE IF EXISTS meta.pg_process")
spark.sql("DROP TABLE IF EXISTS meta.pg_input")
spark.sql("DROP TABLE IF EXISTS meta.pg_source")
spark.sql(s"""CREATE TABLE IF NOT EXISTS meta.pg_process USING JDBC OPTIONS (url "$pgConnectionStringRead",dbtable "($pgQuery) pr")""")
spark.sql(s"""CREATE TABLE IF NOT EXISTS meta.pg_input USING JDBC OPTIONS (url "$pgConnectionStringRead",dbtable "meta.input")""")
spark.sql(s"""CREATE TABLE IF NOT EXISTS meta.pg_source USING JDBC OPTIONS (url "$pgConnectionStringRead",dbtable "meta.source")""")
Below is a screenshot of what it will look like:
Rollback to 5.2.0
The provider upgrades in the 6.0.0 Terraform code for both AWS and Azure do not allow us to go back to the 5.2.0 Terraform code, so the infrastructure will not be able to be rolled back. The 5.2.0 codebase does work on the 6.0.0 infrastructure, so the following steps will need to be done to rollback an environment to run on 5.2.0 again.
Note: On-premise Agents that have already updated to 6.0.0 will not be able to revert back to 5.2.0.
Stop API, Agent, Core containers so no processes can run
- AWS
- Navigate to deployment container logs in Cloudwatch and find the name of the snapshotted database. The snapshot name should be logged in a message that looks like "RDS Cluster snapshot: deployment-2022-09-02-15-46-00 taken"
- Rename current pg14 cluster and instance to add "-6-0" on the end. Ex: dev-pg14-cluster-wm -> dev-pg14-cluster-wm-6-0 and dev-pg14-db-instance-wm-6-0
- Restore snapshot to pg14 name that was renamed in the previous step. Ex: dev-pg14-cluster-wm
- Pick <env>-Database-<client> as VPC security group
- Choose 2a as availability zone
- Serverless v2 instance class
- Capacity range .5-16
- Choose DB Cluster parameter group that was on existing cluster
- Rename db instance to <env>-pg14-db-instance-<client> if it isn't after the snapshot restore.
- Azure
- Restore current pg14 cluster named <environment>-db14-<client> to the same name but with "-5-2" on the end. This must be restored in a window after the migration happened but before the Database deployment finished. Make sure the time picked is BETWEEN these two messages from the query:
- ContainerInstanceLog_CL
| where ContainerGroup_s contains "Deployment" and Message contains "pg_restore success" or Message contains "Database deployment complete"
- ContainerInstanceLog_CL
- Delete the current cluster <environment>-db14-<client>
- Wait 10 minutes for a restore point on the "-5-2" cluster
- Restore the "-5-2" cluster to the "<environment>-db14-<client>" name
- Restore current pg14 cluster named <environment>-db14-<client> to the same name but with "-5-2" on the end. This must be restored in a window after the migration happened but before the Database deployment finished. Make sure the time picked is BETWEEN these two messages from the query:
Change imageVersion in Terraform to 5.2.0, run plan and apply (the Postgres database might be edited but should NOT be trying to be recreated), and have Deployment container run all the way through on 5.2.0.
Once deployment has ran, check the UI and make sure it's on version 5.2.0. Run processes to confirm the rollback was successful. For Azure environments, Cluster Configurations will need to have the following key value pairs added to Spark Conf field if processes are intended to run on 5.2.
{"spark.driver.extraJavaOptions":"-Djava.security.properties=","spark.executor.extraJavaOptions":"-Djava.security.properties="}
Once the environment is ready to try 6.0 again, change imageVersion variable in Terraform to 6.0.0, run plan and apply, and have the Deployment container run all the way through on 6.0.0.
Once the deployment has ran, check the UI and make sure it's on version 6.0.0. Run processes to confirm the deployment was successful.
To cleanup the rollback, delete any cluster that still has a "-5-2" or "-6-0" on the end.
Updated