User Manual
Sources
-
Sources Overview
A Source represents a single schema of data. Sources are the logical grouping for Inputs, Relations, and Ru...
-
Source Settings
The Settings tab for a Source allows a user to specify key information about the Source including input types ...
-
Raw Schema
The Raw Schema tab for Sources allows users to view the raw database attributes as well as raw metadata. ...
-
Dependencies
Dependencies allow configurators to modify the workflow engine to introduce waits to the processing queues ...
-
Relations
Relations define intra-source connections and enable users to configure lookups and cross-source aggregates. ...
-
Rules
Rules allow DataForge to modify and transform data. Rules Tab The Rules tab allows users to select, ...
Processing
-
Processing Queue
The Processing Queue tab provides an interactive overview of all processes completed, active, error...
-
Ingestion Queue
Sample Ingestion Queue The Ingestion Queue tab provides a view of all ingestions that are waiting to run, cur...
-
Workflow Queue
Sample Workflow Queue The Workflow Queue tab provides a view of processes tha...
-
Job Runs
Table showing all job run details Job Runs tab The Job Runs table shows detailed information on all job...
-
Resetting Processes
Types of Resets During the development lifecycle, users will need to reset their sources often to change sourc...
-
Recommended Cluster Configurations during Development
For environments that are undergoing daily development, it may be beneficial to set up your cluster configurat...
Outputs
-
Outputs Overview
Outputs specify where or how DataForge exposes or exports data to external systems Outputs Scre...
-
Output Settings
In the Output Settings screen, users can see the various components that make up an Output, including tabs ...
-
Output Mapping
Output Mapping controls the way data is sent to its final destination. It allows a user to rename columns, app...
-
Process (Output History)
The Process page provides an operational dashboard of the processes completed or currently active for this Out...
Connections
-
Connections
A connection holds the credentials, network locations, and any other parameters required to access data in the...
-
Generic JDBC Connection
DataForge offers a generic JDBC connection type for ingesting data from external databases, outside of the pre...
-
Salesforce Connection
DataForge offers a pre-built connector for Salesforce. Before data can be ingested from Salesforce, users nee...
-
Kafka Events Connection
DataForge integrates directly with Kafka Event Topics for batch or stream data Source ingestion and Output pub...
-
Unity Catalog
DataForge supports reading from and writing to tables stored in Databricks Unity Catalog. While DataForge wil...
Schedules
-
Schedules
A schedule uses a CRON expression to determine how often source inputs are updated. Multiple sources can be as...
Lineage
-
Lineage Overview
Lineage refers to a directed acyclic graph (DAG) generated by DataForge describing how data is processed, trac...
-
Lineage Edges: Dataflows and Relations
What are Edges? Edges are the dataflow arrows connecting nodes on lineage. All edges show the dataflow from le...
-
Lineage Legend
Where's the key? In Lineage, the different types of nodes are represented by combinations of colors and symbol...
-
Lineage Navigating Nodes
Accessing the Navigation Menu Once in a lineage session, users can navigate the dataflow related to any node v...
System Configuration
-
Global Service Configurations
WARNING: Changes to these settings can have a severe impact to the platform and break functionality if set inc...
-
Cleanup Configuration
Cleanup Configuration defines retention settings for data lake objects and metadata. It is accessible via m...
Projects
-
Projects Overview
Projects represent a group of configurations in DataForge that users control and are the primary vehicle for m...
-
Managing Projects with Github
Managing Project configurations with Github allows users to efficiently merge changes from one project to anot...
Import/Export
-
Export/Import Overview
Intro In DataForge, import/export functionality allows users to copy groups of configurations, in the...
Users - Access Administration
-
Manage Users
This article explains how to manage user access to DataForge workspaces. Overview of User Access: There are ...
Templates and Tokens
-
Templates and Tokens Overview
Templates combined with tokens enable centralized deployment and management of re-usable Rules and Relations a...
-
Tokens
Token management screens overview Tokens can be managed and created in the Tokens page which is found by o...
-
Relation Templates
Relation Templates Use Case When configuring multiple Sources within DataForge, it is common for repeated ...
-
Rule Templates
Rule Templates Use Case When configuring multiple Sources within DataForge, it is common for repea...
-
Best Practices
Best practices, patterns and recommendations for template configuration and management Do When fi...
Agents
-
Agents
Overview Agents are a lightweight application that can be installed on a machine to allow data ingestion from ...
-
Logs
UI Logs Agent logs can be found in the UI by clicking on the Logs icon on the Agents screen. Logs can...
-
Installing a New Agent
Details requirements and configurations to installing an Agent on a server that can access an on-premise data ...
SDK
-
Custom Notebook SDK
SDK Overview and Development Approach The DataForge SDK allows users to define their own Scala code in a...
-
Setting up a Cluster to run Custom Notebooks
Before creating any Custom Notebooks for DataForge, it is best to setup the Databricks environment. This pa...
-
Setting Up Custom Processes for Automatic Processing
Custom Ingest, Parse, and Post Output can all be configured to run on the user's Custom created notebooks b...
-
Custom Connections
Custom Connection allow users to store sensitive parameters and pass them into Custom Ingest and Custom Par...
-
Parameters
Use Case Each Custom Ingest, Custom Parse, and Custom Post Output has a Custom Parameters object that...
-
Using Multiple Languages in a Databricks Notebook
Working With Multiple Languages Writing Databricks notebooks for Custom Ingest, Custom Parse, or Custom Post O...
Cloning
-
Cloning Overview
Summary and Use Case Cloning allows users to create copies of multiple sources/outputs and the relat...