7.0.8 Upgrade Guide

The following guide is used for upgrading from the 6.2.x versions of DataForge to the new 7.0.8 version. 

Please review the respective guides for AWS and Azure below.

AWS Upgrades

Azure Upgrades

Note: Sparky Pool that is created in Databricks is no longer used by DataForge and will remain on Databricks Runtime 9.1 LTS.  DataForge will leave the Sparky Pool as is but recommends migrating all notebooks and clusters referencing it to use a new or different pool.  All Job cluster configurations in DataForge will be updated to use the 11.3 LTS runtime.  All cluster configurations using Custom Databricks Pools will need to be updated with a new 11.3 Databricks Pool.
Note: The existing jar file paths for installing the DataOps SDK Jar and import package names will no longer be supported after November 30th, 2023.  This includes the file paths in Azure (dbfs:/mnt/jars/dataops-sdk.jar) and AWS (s3://<Environment>-datalake-<Client>/dataops-sdk.jar).  The new file path for both cloud providers will be dbfs:/mnt/processing-jars/dataforge-sdk/dataforge-sdk-$version.jar moving forward.  Replace the $version.jar portion of the file path to match the version number your environment is running (e.g. 7.0.8.jar). The previous package used to import within notebooks of "com.wmp.intellio.dataops" will now be "com.dataforgelabs".  Please follow the Migration guide for instructions on how to update both artifacts as soon as possible to avoid issues.

AWS Upgrades:

Any environments not on the 6.2.4 version need to upgrade to 6.2.4 first before upgrading to the 7.0.8 version.

Upgrade to 6.2.4:

  1. Follow the Github update steps in the normal upgrade guide. Stop before running a new Terraform Plan and Apply to add the additional changes below.
  2. Create a new Terraform Variable named "upgradeTo624" and set the value to "yes"
  3. Rename the imageVersion variable in Terraform to "manualUpgradeVersion" or create a new variable with the name "manualUpgradeVersion" and update the value to be "6.2.4"
  4. A new Terraform variable named "deploymentToken" should exist already for each workspace and have a secret value entered.  Please do not change this as it is required for Auto-Upgrade functionality to work. If this variable does not exist, please submit a support request. Once you have received the token value from DataForge support, add the new Terraform variable named "deploymentToken" and set the variable to Sensitive. Set the value equal to the token that was provided to you.
  5. Add a new Terraform variable named "releaseUrl" and set the value equal to "https://release.wmprapdemo.com"
  6. Add a new Terraform variable named "whitelist" and mark the checkbox to the right so HCL is turned on.  The value should be set in this format below but be sure to add your own email domain to the list.  The contents of this whitelist can also be copied from the existing Auth0 whitelist rule to the new Terraform variable.["wmp.com","westmonroe.com","westmonroepartners.com","dataforgelabs.com","<customeremaildomain>"]
  7. Run Terraform Plan and Apply and upgrade to 6.2.4.  Follow the normal upgrade guide for steps on checking the Deployment container and Cloudwatch for completion.
  8. In Terraform workspace, delete/Remove the variable named "upgradeTo624" as it is no longer needed

Upgrade to 7.0.8:

Before beginning the upgrade to 7.0.8, recreate any existing Databricks Pools used.  Copy all of the same settings but change the Databricks Runtime Version to 11.3 LTS.  The IDs from these new pools will need to be replaced in all cluster configurations in DataForge that are set to the "Job from Pool" setting.

  1. Follow the Github update steps in the normal upgrade guide to create a new branch of the 7.0.8 infrastructure repository. Stop before running a new Terraform Plan and Apply to make the additional changes below.
  2. In Terraform workspace, delete/Remove the variable named "upgradeTo624" as it is no longer needed
  3. In Terraform workspace, delete/Remove the variable named "releaseUrl" as it is no longer needed
  4. In Terraform workspace, update the variable "manualUpgradeVersion" and set the value to "7.0.8"
  5. In Terraform workspace, update the variable "whitelist" and add the value of "dataforgelabs.com"
  6. In Terraform workspace, add a new variable named "environment_id" and set the value equal to the value provided by the DataForge team.
  7. In Terraform workspace, update the variable "usageAuth0Secret" and set the value equal to the value provided by the DataForge team
  8. In Terraform workspace Runs, discard any pending Plan/Apply Runs
  9. Update Terraform Databricks Permissions (Preferred to have DataForge Infra team do these steps to avoid issues or do this live together on a call.)
    • For clients using Terraform Cloud:
      • Dataforge Infrastructure team will connect to the customer Terraform Cloud workspace being upgraded through a local machine to run the following commands, replacing the organization name and workspace name with the Terraform Org Name and Terraform Workspace (case sensitive)
        terraform {
          cloud {
            organization = "westmonroepartners"
            workspaces {
              name = "AWS-Development"
            }
          }
        }
        • run this command in terraform cli: rm .terraform.lock.hcl
        • run this command in terraform cli: terraform init -upgrade
        • run this command in terraform cli: terraform state replace-provider databrickslabs/databricks databricks/databricks, enter yes
        • run this command to make sure it worked: terraform init -upgrade
  10. Run Terraform Plan and Apply and upgrade to 7.0.8.  Follow the normal upgrade guide for steps on checking the Deployment container and Cloudwatch for completion.
  11. Open Databricks in the Workflows tab and check that the mount_job has finished successfully.
  12. Once the Terraform Plan and Apply is run through, add a new Terraform Variable named "rotateSparkyKeys" and set the value to "no"
  13. For any cluster configurations in the DataForge environment that are set to use the "Job from Pool" setting, update the cluster configurations in DataForge with the new pool ID that was created before these steps were run.
  14. Open Databricks and execute notebook "/dataforge-managed/7.0-post-upgrade-enr-relation-check" to check migration status for rule and relation parameters. Review notebook output and perform listed actions

Azure Upgrades:

Any environments that are currently on versions 6.2.x can upgrade directly to the 7.0.8 version and skip the Upgrade to 6.2.4 steps below.

For environments not currently on a 6.2.x version, please upgrade to the 6.2.4 version before upgrading to the 7.0.8 version by following the steps below.

Upgrade to 6.2.4:

  1. Follow the Github update steps in the normal upgrade guide. Stop before running a new Terraform Plan and Apply to add the additional changes below.
  2. Rename the imageVersion variable in Terraform to "manualUpgradeVersion" or create a new variable with the name "manualUpgradeVersion" and update the value to be "6.2.4"
  3. A new Terraform variable named "deploymentToken" should exist already for each workspace and have a secret value entered.  Please do not change this as it is required for Auto-Upgrade functionality to work. If this variable does not exist, please submit a support request. Once you have received the token value from DataForge support, add the new Terraform variable named "deploymentToken" and set the variable to Sensitive. Set the value equal to the token that was provided to you.
  4. [Azure Only] Add a new Terraform variable named "legacyDatabricksSubnet" and set the value to "yes".
  5. Add a new Terraform variable named "releaseUrl" and set the value equal to "https://release.wmprapdemo.com"
  6. Add a new Terraform variable named "whitelist" and mark the checkbox to the right so HCL is turned on.  The value should be set in this format below but be sure to add your own email domain to the list.  The contents of this whitelist can also be copied from the existing Auth0 whitelist rule to the new Terraform variable.["wmp.com","westmonroe.com","westmonroepartners.com","dataforgelabs.com","<customeremaildomain>"]
  7. Run Terraform Plan and Apply and upgrade to 6.2.4.  Follow the normal upgrade guide for steps on checking the Deployment container and Cloudwatch for completion.

Upgrade to 7.0.8:

Before beginning the upgrade to 7.0.8, recreate any existing Databricks Pools used.  Copy all of the same settings but change the Databricks Runtime Version to 11.3 LTS.  The IDs from these new pools will need to be replaced in all cluster configurations in DataForge that are set to the "Job from Pool" setting.

  1. Follow the Github update steps in the normal upgrade guide to create a new branch of the 7.0.8 infrastructure repository. Stop before running a new Terraform Plan and Apply to make the additional changes below.
  2. In Azure Portal, open the Resource Group and open the KeyVault resource.  Navigate to Access Policies in the left panel menu, and edit the "Intellio Terraform" application policies to give Secret Purge access.  This Purge option should be found under "Privileged Secret Operations" when editing policies.
  3. In Terraform workspace, delete/Remove the variable named "upgradeTo624" as it is no longer needed
  4. In Terraform workspace, delete/Remove the variable named "releaseUrl" as it is no longer needed
  5. In Terraform workspace, update the variable "manualUpgradeVersion" and set the value to "7.0.8"
  6. In Terraform workspace, update the variable "whitelist" and add the value of "dataforgelabs.com"
  7. In Terraform workspace, add a new variable named "environment_id" and set the value equal to the value provided by the DataForge team.
  8. In Terraform workspace, update the variable "usageAuth0Secret" and set the value equal to the value provided by the DataForge team
  9. In Terraform workspace Runs, discard any pending Plan/Apply Runs
  10. Update Terraform Databricks Permissions (Preferred to have DataForge Infra team do these steps to avoid issues or do this live together on a call.)
    • For clients using Terraform Cloud:
      • Dataforge Infrastructure team will connect to the customer Terraform Cloud workspace being upgraded through a local machine to run the following commands, replacing the organization name and workspace name with the Terraform Org Name and Terraform Workspace (case sensitive)
        terraform {
          cloud {
            organization = "westmonroepartners"
            workspaces {
              name = "AWS-Development"
            }
          }
        }
        • run this command in terraform cli: rm .terraform.lock.hcl
        • run this command in terraform cli: terraform init -upgrade
        • run this command in terraform cli: terraform state replace-provider databrickslabs/databricks databricks/databricks, enter yes
        • run this command to make sure it worked: terraform init -upgrade
  11. Run Terraform Plan and Apply and upgrade to 7.0.8.  Follow the normal upgrade guide for steps on checking the Deployment container and Cloudwatch for completion.
  12. Open Databricks in the Workflows tab and check that the mount_job has finished successfully.
  13. Once the Terraform Plan and Apply is run through, add a new Terraform Variable named "rotateSparkyKeys" and set the value to "no"
  14. For any cluster configurations in the DataForge environment that are set to use the "Job from Pool" setting, update the cluster configurations in DataForge with the new pool ID that was created before these steps were run.
  15. Open Databricks and execute notebook "/dataforge-managed/7.0-post-upgrade-enr-relation-check" to check migration status for rule and relation parameters. Review notebook output and perform listed actions

 

Updated

Was this article helpful?

0 out of 0 found this helpful