(Azure Workspaces) Creating "dataforge" unity catalog

When the DataForge workspace deployment runs, it will attempt to connect to a "dataforge" catalog in your Databricks workspace so this catalog needs to be created in your Databricks workspace for the deployment to complete successfully. This catalog will be where all refining stages and hub tables will be located for data processed.

If you would like to name your catalog to something other than "dataforge", please contact DataForge support as this requires following a process to ensure source hub tables are processed correctly.

To create the "dataforge" catalog, you will need to specify a storage location. The catalog can point to any storage location you prefer, but for simplicity, it is recommended to create the catalog using the default mnt_datalake storage location created if you've used the Terraform Quickstart to deploy your Databricks workspace.

Follow the steps below to create an empty "dataforge" catalog and assign all required permissions for deployment to run successfully.

1. Create a user (and assign both Workspace Admin and Account Admin privileges) that will be used to run jobs from DataForge. Make note of this user as you will need to assign them multiple permissions and generate a personal access token later.

2. Create a new Catalog named "dataforge"

  • Use "Standard" catalog type
  • Can use any storage location, but recommended using "mnt_datalake" (if you've used the Terraform Quickstart)

3. Once the catalog is created, grant permissions on the catalog for the DataForge authorized user of "ALL PRIVELEGES".

4. Open the Catalog page in Databricks and click the gear icon and select the metastore assigned to your Databricks workspace. Navigate to the Permissions tab and assign the following permissions to your DataForge authorized user:

  • MANAGE ALLOWLIST
  • CREATE CONNECTION
  • CREATE CATALOG

 

Updated

Was this article helpful?

0 out of 0 found this helpful