When the DataForge workspace deployment runs, it will attempt to connect to a "dataforge" catalog in your Databricks workspace so this catalog needs to be created in your Databricks workspace for the deployment to complete successfully. This catalog will be where all refining stages and hub tables will be located for data processed.
If you would like to name your catalog to something other than "dataforge", please contact DataForge support as this requires following a process to ensure source hub tables are processed correctly.
To create the "dataforge" catalog, you will need to specify a storage location. This can either use your account-level metastore if it contains a storage path, or you can create a storage credential and external location to point to a specific bucket.
The storage credential and external connection can be pointed to any bucket. For simplicity, we recommend creating the catalog to point to "s3://<datalake_bucket_path>".
1. Create a user (and assign both Workspace Admin and Account Admin privileges) that will be used to run jobs from DataForge. Make note of this user as you will need to assign them multiple permissions and generate a personal access token later.
2. Create a new Catalog named "dataforge"
The storage credential and external connection can be pointed to any bucket. If you already have a storage location you would like to use, skip down to creating the catalog. For simplicity, we recommend creating the catalog to point to "s3://<datalake_bucket_path>".
-
External Location using storage credential
- Can use any bucket, but recommended using "s3://<datalake_bucket_path>"
- Including permission for the DataForge authorized user or service principal of "ALL PRIVELEGES"
-
Catalog named "dataforge" using external location from step 2
- Use "Standard" catalog type
- Can use any bucket within the external connection, but recommended using "s3://<path>"
- Including permission for the DataForge authorized user or service principal of "ALL PRIVELEGES"
3. Once the catalog is created, grant permissions on the catalog for the DataForge authorized user of "ALL PRIVELEGES".
4. Open the Catalog page in Databricks and click the gear icon and select the metastore assigned to your Databricks workspace. Navigate to the Permissions tab and assign the following permissions to your DataForge authorized user:
- MANAGE ALLOWLIST
- CREATE CONNECTION
- CREATE CATALOG
Updated