Compute configurations allow users to select specific settings for compute. Several sources can then be linked to a compute configuration.
For recommendations on setting up compute configurations for optimal performance and cost management, visit the Production Workload Compute Configuration Recommendations.
Compute List
Compute configurations can be found on their own page which is accessed from the main menu; simply click on System Configurations and select Compute Configurations.
Select Compute Configurations
The compute configurations table shows all of the major details on any existing compute. Users can filter by compute names and descriptions as well as sort by column value.
To start new compute manually, click on the launch icon under the Start column.
Clicking any other column will direct users to the settings page of the compute. To make a new compute configuration, hit the NEW + button in the top right corner.
Settings
The compute settings page allows users to create and update compute configurations for their sources. The example below shows the default settings for a new compute configuration.
- Name*: A unique name.
- Description*: A one sentence summary describing the compute.
- Default Compute Configuration: A flag that marks the compute as the default. Once active, toggle is disabled until another default compute is selected.
- Compute Type*: Create either a new Job (default, recommended) or user specified pool.
- (Databricks only) Scale Mode*: The number of workers can be automatically managed by Databricks or can be a fixed value.
- (Databricks only) Job Task Type*: Jobs will either execute a custom notebook in Databricks or the DataForge Jar will be used.
- Notebook Path*: The full file path to the custom notebook. Only required when custom notebook job task type is selected.
The Duplicate button near Save will create a copy of the configuration in a new tab with the same settings and a name of "<configuration name> COPY". The duplicated configuration is not attached to any objects automatically.
Advanced Parameters
Depending on the selections made in the required parameters section, the advanced parameters section will provide various sub-settings to help configurators tune jobs to their needs.
Descriptions for each are included in the UI. Please submit a support ticket if the descriptions in the UI do not adequately explain the functionality of a specific parameter or if question remain.
Any user modified parameters will be displayed as bold
A parameter worth noting specifically is Max Processes Per Compute. By default, every time a source kicks off a new process, it will start a new job run of the compute configuration that is attached.
Updated