DataForge offers a generic JDBC connection type for ingesting data from external databases, outside of the pre-built connection driver options. Before data can be ingested via JDBC, a connection needs to be created and cluster configurations or agents need to be updated. DataForge recommends the faster and more efficient cluster-based ingestion method than Agent-based method.
Generic JDBC Connection
Create a new connection from the Connections page using the New + button.
Provide the new connection with a name and description.
Select the following options for the connection:
Connection Direction: Source
Connection Type: Table
Driver: Generic JDBC
Continue to enter the JDBC Connection String following the format applicable to the Database being connected to.
Connection Parameters
Fill in the following optional connection parameter fields:
Driver: Enter the driver class path of the JDBC Driver being used for the connection.
Example: com.databricks.client.jdbc.Driver
JDBC Sensitive Parameters: Enter any sensitive parameters such as passwords or tokens required for the connection in this field using JSON key/value format.
Example: {"PWD":"wraldkyjuzavf13zcuohzfwd6g07k4fg5lw"}
After all necessary fields are filled in, save the connection.
(Cluster-based) Ingestion Cluster for JDBC Driver
DataForge recommends this cluster-based method, also known as a sparky ingestion. The cluster configuration used during source ingestions also needs the driver library attached.
Generally, DataForge recommends creating a unique cluster configuration for use with each type of Generic JDBC driver connection. This allows for using the cluster configuration for multiple sources, but saves ingestion clusters from installing the library at runtime for sources using other types of connections.
For recommendations on creating cluster configurations, refer to the Production Cluster Recommendations documentation.
In the cluster configuration, add the JDBC library to the Job Configuration Parameters in the Libraries setting in a JSON format.
Maven library example: [{"maven":{"coordinates":"com.databricks:databricks-jdbc:2.6.33"}}]
Example JDBC Cluster Configuration
Once the Cluster Configuration is created, add it to a new or existing Process Configuration as either the Default Cluster or as an override for the Sparky Ingestion process type. For more information on Process Configurations, refer to the Process Configuration documentation.
Example process config with JDBC cluster for ingestions only
(Agent-based) Placing JAR in Agent Location
Before the Generic JDBC connection can be used, place the JAR for the JDBC driver into the Dataforge folder ("C:/Program Files/DataForge") on the machine where the agent is installed.
Applying the JDBC Connection to a Source
If the ingestion will be cluster-based, change the Process Configuration and Connection selections to match the new configurations created for using the Generic JDBC option. Select the Generic JDBC connection in the Connection drop-down.
Example source configured for Generic JDBC ingestions
Updated