Follow each of the steps in this guide in order to use the Terraform DataForge Quick Start tool to create all the necessary resources and infrastructure for a new DataForge Workspace.
Google Chrome is the supported browser for DataForge. Using any unsupported browsers may cause issues loading pages.
Microsoft Azure Portal
- In Azure Portal, create an App Registration in your Azure subscription that has Contributor permission and the following API permissions. These permissions are found in App Registration -> API Permissions -> Add a Permission -> Microsoft Graph.
- Application.ReadWrite.All
- Directory.ReadWrite.All
- In Azure Portal, open the Quotas page, filter the Region for the region you will use for your Databricks and DataForge environment, and request a quota increase for the following Quotas. Quota increases do not increase your cost. Lower quotas means fewer jobs can run at the same time while using DataForge and Databricks and may result in job failures. If Microsoft denies these quota requests, try again with a slightly lower amount.
- Total Regional vCPUs - increase to 100 (15 is bare minimum)
- Standard DSv3 Family CPUs - increase to 100 (15 is bare minimum)
Github
- Create a Github Account if you do not already have one and follow the signup instructions to confirm your account. The Free account will work for this quick start: https://github.com/signup
- Once you're signed into Github, open the DataForgeLabs Terraform Module Examples repo
- Click the Fork option near the top-right and click the Create Fork button to fork the repo into your own account.
- The forked repository contains a directory, "azure", that contains two files that include the default variables that need to be defined for the quick start tool to work. You can change these files in two ways:
- Change the quick start tool to deploy the necessary resources in an existing Resource Group in your Azure subscription
- Customize the network Databricks will be deployed to using the remaining Optional Inputs from the guide
- Make either of these changes by replacing the variables in the "main.tf" and "variables.tf" files within the directory. Both files must have the same variables listed. Use the following guide to update your directory files with the appropriate variables: https://registry.terraform.io/modules/dataforgelabs/aws-databricks/dataforge/latest?tab=inputs
- Be sure the "main.tf" and "variables.tf" files contain the required inputs along with other variables you want to define and Commit the Changes to your repository.
Terraform
- Sign up for a Terraform Account if you do not already have one and confirm the new account through the email confirmation that is sent: https://app.terraform.io/public/signup/account
- Once you are signed in to Terraform, create a new Organization. The organization can be given any name you like as long as it follows the Terraform creation guidelines listed.
- Create a new Workspace in Terraform:
- Choose the Version Control Workflow as you will need to sync with your Github account
- Select Github and Github.com for the version control provider.
- In the window popup that appears, select the Authorize Terraform Cloud button. If the popup window does not appear, you may need to adjust your browser settings to allow popups.
- On the next popup that appears, select the button to Install. If you are seeing a Terraform page that has a spinning icon and shows Github App Installation, look for the popup mentioned on your windows.
- You should now be on a screen to choose your repository. Select the repository that you forked from the DataForge Terraform Module Examples repo in the Github steps.
- Expand the "Additional Options" section and enter "azure" into the Terraform Working Directory, then select the Create option at the bottom of the page
- Terraform will show a Configure Terraform Variables page, automatically listing the variables needed. Enter the value of each variable from your Azure account. Use the Inputs Guide to read about where to find the value of each variable. For environment_prefix, use only alphanumeric characters and dashes (underscores will cause failures). After entering the values, select the Save Variables option.
- Select the Start Run button, optionally give the run a name like "dataforge quickstart", and leave the Run Type as "Plan and apply (standard)". Select the Start button.
- When the Plan stage is complete, you will see a green checkmark and the message "Plan Finished". Scroll to the bottom of the page and select the Apply button to finish letting Terraform stand up all the resources in your Cloud environment.
When the Apply stage in Terraform is complete, you should see a green checkmark and the message "Apply Complete". You are now finished using the DataForge Terraform Quick Start and all of the necessary resources and infrastructure have been created for you to easily request a new DataForge Workspace. A new databricks workspace will be available to open within the existing resource group (if customized) or a new resource group named like "<environmentprefix>-DB-Workspace-RG".
Please return to the New DataForge Workspace Creation form to finish your setup. Datalake Mount Path will be required during workspace creation and was created as "/mnt/datalake" during this quickstart.
If issues arise or additional help is needed, please open a support request with the DataForge team and one of our members will assist you with getting the Quick Start working.
Updated