DataForge supports reading from and writing to tables stored in Databricks Unity Catalog. While DataForge will ingest data from the Unity Catalog and output data to the Unity Catalog, processing file storage for all other processes (CDC, Enrichment, etc) is still maintained in the datalake storage bucket.
Requirements
Unity Catalog is enabled on the Databricks workspace and a Unity Catalog exists to read from/write to. For more information, see Databricks Unity Catalog Get Started guide.
Source Connection
Set up a Source Connection to ingest data from a Unity Catalog. Navigate to the Connections page and create a new Connection using the following selections:
Connection Direction: Source
Connection Type: Table
Uses Agent: No
Catalog: Enter your Unity Catalog name here
Additional parameters related to Connection Metadata are optional.
Save the connection and it is now ready to attach to Sources. To use this Connection on a Source, use Connection Type of Table when creating your Source. Below is an example of this Source Connection.
Output Connection
Set up an Output Connection to publish data to a Unity Catalog. Navigate to the Connections page and create a new Connection using the following selections:
Connection Direction: Source
Connection Type: Table
Driver: Delta Lake
Credentials: Implicit (already selected)
Expand the parameters and fill in the following:
Catalog: Enter your Unity Catalog name here
Save the connection and it is now ready to attach to Outputs. To use this Connection on an Output, use Output Type of Table when creating the Output. Below is an example of this Output Connection.
Updated