Generic JDBC Connection

DataForge offers a generic JDBC connection type for ingesting data from external databases, outside of the pre-built connection driver options.  Before data can be ingested via JDBC, a connection needs to be created and cluster configurations or agents need to be updated.  DataForge recommends the faster and more efficient cluster-based ingestion method than Agent-based method.

Generic JDBC Connection

Create a new connection from the Connections page using the New + button. 

Provide the new connection with a name and description.  

Select the following options for the connection:

Connection Direction: Source

Connection Type: Table

Driver: Generic JDBC

 

Continue to enter the JDBC Connection String following the format applicable to the Database being connected to.

Connection Parameters

Fill in the following optional connection parameter fields:

Driver: Enter the driver class path of the JDBC Driver being used for the connection.

Example: com.databricks.client.jdbc.Driver

JDBC Sensitive Parameters: Enter any sensitive parameters such as passwords or tokens required for the connection in this field using JSON key/value format.

Example: {"PWD":"wraldkyjuzavf13zcuohzfwd6g07k4fg5lw"}

After all necessary fields are filled in, save the connection.

(Cluster-based) Ingestion Cluster for JDBC Driver

DataForge recommends this cluster-based method, also known as a sparky ingestion. The cluster configuration used during source ingestions also needs the driver library attached.

Generally, DataForge recommends creating a unique cluster configuration for use with each type of Generic JDBC driver connection.  This allows for using the cluster configuration for multiple sources, but saves ingestion clusters from installing the library at runtime for sources using other types of connections.

For recommendations on creating cluster configurations, refer to the Production Cluster Recommendations documentation.

In the cluster configuration, add the JDBC library to the Job Configuration Parameters in the Libraries setting in a JSON format.  

Maven library example: [{"maven":{"coordinates":"com.databricks:databricks-jdbc:2.6.33"}}]

Example JDBC Cluster Configuration

Once the Cluster Configuration is created, add it to a new or existing Process Configuration as either the Default Cluster or as an override for the Sparky Ingestion process type. For more information on Process Configurations, refer to the Process Configuration documentation.

Example process config with JDBC cluster for ingestions only

(Agent-based) Placing JAR in Agent Location

Before the Generic JDBC connection can be used, place the JAR for the JDBC driver into the Dataforge folder ("C:/Program Files/DataForge") on the machine where the agent is installed.

Applying the JDBC Connection to a Source

If the ingestion will be cluster-based, change the Process Configuration and Connection selections to match the new configurations created for using the Generic JDBC option. Select the Generic JDBC connection in the Connection drop-down.

Example source configured for Generic JDBC ingestions

Updated

Was this article helpful?

0 out of 0 found this helpful