Cluster and Process Configuration Overview

Terms and definitions

Object
Description
Cluster
Configuration
Stores all configuration settings required for the databricks job: cluster configuration, job configuration + few DataForge-specific parameters used to control job execution. Cluster configuration record is directly linked to the databricks job via unique job_id attribute.
Process Configuration Comprised of one default cluster configuration and optional set of cluster configurations for each specific process type. Process configuration is attached to each Source in DataForge.

 

Below is high level diagram representing relationship of cluster and process configurations to other DataForge metadata tables and Databricks objects
 

Updated

Was this article helpful?

0 out of 0 found this helpful