Connections

A connection holds the credentials, network locations, and any other parameters required to access data in the location where it is either generated or staged for ingestion by DataForge


Connections

The connections home page enables users to quickly search and access connections already configured in the DataForge platform.
 
Only Connections marked as Active are shown here, unless the Active Only toggle is set to off.
 
 

Connection Settings

The connection settings page enables users to provide the necessary parameters to enable DataForge to access the system.
 
  • Name*: A unique name
  • Description*: A description
  • Active: Allows users to disable the connection without deleting the configuration
  • Group: Allows users to include the connection in a group (requires Connection Template selection to be used)
  • Connection Template: Allows users to select from a connection template to normalize Name (requires Group selection to be used)
  • Connection Direction: Specifies if this connection is used to ingest or output data
  • Connection Type: Specifies the format or location style of the source or target data. Depending on the Type selected, the remaining parameters will change
  • Uses Agent: A visual indicator showing whether or not this connection will use an Agent (only available if Source Connection Direction is selected)
  • Agent*: Where Agent is required, used to select the Agent to be used (only available if Source Connection Direction is selected).  

The Duplicate button near Save will create a copy of the configuration in a new tab with the same settings and a name of "<configuration name> COPY". The duplicated configuration is not attached to any objects automatically.


API Connection Type

Options available:


Custom Connection Type

Used in the SDK as part of Custom Ingestion.
 
Parameters here are optional, as not all custom ingestion notebooks require parameterization.
 
These parameters should be a JSON object in format of {"key1": "value1", "key2": "value2"}
 
For connections that do not need any parameters, enter an empty JSON object {}
  • Public Connection Parameters*: Passed in as plain-text to the custom ingestion session
  • Private Connection Parameters*: These parameters are stored on save into the Databricks Secrets of the respective cloud service

Event Type

Options available:


File Connection Type

  • Storage Technology*: Specifies the type of file storage Agent/Databricks will attempt to access
  • File Path*: The folder/container path for DataForge to access when pulling or generating files

Table Connection Type

  • Driver*: Which JDBC driver should be used

When using the Generic JDBC driver option, users need to enter the connection string, driver class path, and any sensitive parameters.  The driver library also needs to be entered into the cluster configuration parameters libraries setting that will be used for ingestion on any sources.  For more information and examples, refer to the Generic JDBC Connection documentation.


Parameters

The parameters section will change dynamically based on the required selections above, are typically optional to configure, and are used for advanced configuration or specifications.
To utilize Connection Metadata for database connections, use the parameters Metadata Refresh and Metadata Schema Pattern.  See Connection Metadata below for more detail.

Connection Metadata

Connection Metadata shows an optional list of all tables, referenced tables, primary/foreign keys. To populate Connection Metadata, use the parameters in the Connection Settings page for Metadata Refresh and Metadata Schema Pattern.

  • Metadata Refresh (includes four options):
    • Tables, Columns, and Keys collects the most granular information for each table pulling the list of columns and keys defined in each table. This is the default and enables Talos AI to directly search for specific fields within connections for users.
    • Tables and Keys (default) collects the table names and key column identifiers for each table
    • Tables collects only table/view names
    • None disables metadata collection for the connection
  • Metadata Schema Pattern (optional):
    • Specifies LIKE pattern to filter schemas for metadata collection
Note: Connection Metadata only works with Table/Database connections at this time.

With Connection Metadata, users have the option to directly create sources from connection tables, including referenced tables recursively. The Metadata Refresh parameter must be selected to any option except None to use this feature.

After selecting the checkbox next to a table the user wants, options will appear in the triple dot menu to also add referenced tables or referenced tables recursively.  Add referenced tables recursively will include all sources referenced in the chain of tables.

Create new sources directly from Connection Metadata by selecting the Create Source(s) button in the triple dot menu top-right above the table.  

A source creation modal appears to set the source naming pattern for DataForge to use when creating the sources.  If the connection Metadata Refresh parameter is selected to Tables and Keys, an option will appear to automatically Create Relations between all the sources.  This option will only work the first time the sources are created or when the user changes the naming pattern to recreate all of the sources as new with relations.

Toggle the Initiate Data Pull to start a new ingestion for each source when they're created rather than manually running a new data pull on each source.

The Sources column of the Connection Metadata tab will provide a number and hyperlink to any existing sources set up to pull data from the specific table/view.  Click the hyperlink to view and/or open the existing sources.

The Refresh button will launch the Connection Test cluster to retest the connection and do a new scan of the database tables and views.

Updated

Was this article helpful?

0 out of 0 found this helpful