Cluster Configuration Overview

Cluster configurations allow users to select specific settings for a cluster. Several sources can then be linked to a cluster.
 
For recommendations on setting up cluster configurations for optimal performance and cost management, visit the Production Workload Cluster Configuration Recommendations.
 

Clusters List

Cluster configurations can be found on their own page which is accessed from the main menu; simply click on System Configurations and select Cluster Configurations.
 
Select Cluster Configurations
 
The cluster configurations table shows all of the major details on any existing clusters. Users can filter by cluster names and descriptions as well as sort by column value.
 
If a cluster has an associated job ID, users can view the job details or start a new job run. To start a new job run, click on the launch icon under the Start column. The view job details, click on the ID number under the Job ID column.
 
Clicking any other column will direct users to the settings page of the cluster. To make a new cluster, hit the NEW + button in the top right corner.
 
 

Settings

The cluster settings page allows users to create and update cluster configurations for their sources. The example below shows the default settings for a new cluster configuration.
 
  • Name*: A unique name.
  • Description*: A one sentence summary describing the cluster.
  • Default Cluster Configuration: A flag that marks the cluster as the default. Once active, toggle is disabled until another default cluster is selected.
  • Cluster Type*: Create either a new Job or a new Job from sparky-pool (default) or user specified pool.
  • Scale Mode*: The number of workers can be automatically managed by Databricks or can be a fixed value.
  • Job Task Type*: Jobs will either execute a custom notebook in Databricks or DataForge Sparky Jar will be used.
  • Notebook Path*: The full file path to the custom notebook. Only required when custom notebook job task type is selected.
Selecting DataForge jar as Job Task Type will use the DataForge Sparky Jar for all process types except custom.
The Duplicate button near Save will create a copy of the configuration in a new tab with the same settings and a name of "<configuration name> COPY". The duplicated configuration is not attached to any objects automatically.

 

Advanced Parameters

Depending on the selections made in the required parameters section, the advanced parameters section will provide various sub-settings to help configurators tune jobs to their needs.
 
Descriptions for each are included in the UI. Please refer to the Databricks documentation for in-depth details. Please submit a feature request support ticket if the descriptions in the UI do not adequately explain the functionality of a specific parameter.
 
Any user modified parameters will be displayed as bold
 
 

Updated

Was this article helpful?

0 out of 0 found this helpful