User Interface

Overview

The user interface is the front end to DataForge that developers will interact with the most.
Users must login to the DataForge user interface with a DataForge account. Authentication for Standard customers is handled in directly in the user interface and each customer will have at least one Admin user to handle user management.  For Private Enterprise customers, authentication is handled with Auth0, so customers need to be provided with an account in order to login to and use the user interface.
This article provides a brief overview of the DataForge UI.  For a more in depth explanation of each page and the features available, refer to the User Manual section of this documentation.
 
Left hand main menu of the DataForge UI.
Within DataForge, users are able to set up data sources and outputs, enrich existing data, and perform troubleshooting. Everything can be reached from the main menu on the left side which can be left open or closed based on user preference.
 
Sources, Outputs, and Connections can be viewed and filtered from their respective pages. Users will be using these pages the most when configuring data in DataForge since Sources, Outputs, and Connections are all required components of DataForge data ingestion and output processes.
 
Clicking on a specific Source on the Sources page will show all of the details and options related to that Source. The same is true for the Outputs and Connections pages.
 

Sources and Processing

The Sources page (the UI shown when 'Sources' is selected from the left-hand menu) shows the current status of all Sources as well as the status of the individual phases of the Source processes.
 
Users can view the status history of Sources and keep track of activity trends. When a Source is clicked on, the user is brought to the tab navigation of that Source.
 
Tabs within a Source
 
Sources are the main containers of the DataForge interface. There is one logical data flow per source. Connections, Outputs, and Agents are associated with Sources, and these other aspects of DataForge enable Sources to work through the Logical Architecture.
 
Within a Source, the tab structure along the top provides a framework of organizing and monitoring different aspects of the logical data flow. Enumerated below are the general summary of what each tab represents.
 

Sources Tabs

Tab Summary
Settings
 
The main interface for set up and configuration of a Source. Set the Connection and define how to get data and how to process it.
Raw Schema Metadata associated with the raw attributes ingested from the source connection. Additional information about data profiles and how the raw attributes are used in the system exists.
Dependencies Sources may be dependent upon other Sources. This tab displays additional data flows the Source may be dependent upon.
Relations Additional data may be joined to a source through use of rules, relations, and output mappings. This tab displays the details of the relations between sources (similar to SQL Joins). 
Rules Documentation of row level manipulations of the data, inclusive of both enrichments and validation rules. Adding new enrichments allows the user to apply business logic to the data.
Inputs View of the associated raw data that is ingested through the Connection settings. Primary interface for ingesting new data and re-processing existing data.
Process Overview of the logical data flow and the calculations enacted. This view checks the status of the logical data flow and displays individual process results along with logs and Databricks job runs.
Data View
 
Opens the source hub table in Databricks for users to query.

 

Processing

If errors occur, the DataForge user interface has troubleshooting functionality that allows customers to report and handle issues effectively.
 
The Processing page (opened from Main Menu) allows users to monitor processes globally. From the Processing page, users can view and search all current and previous DataForge processes as well as any process dependencies.
 
Process Tab and Status of logical data flow
Each Source will have it's own Process tab representing only the processes from that source as well with a similar look.
 
Within the Process Tab, logs are kept on how data flows through the logical data flow. As seen in the above image, Process IDs 27183-27187 at the left of the image report on this particular source going through Ingestion, Capture Data Changes (CDC), Enrichment, Refresh, and Output (currently running). When errors occur, the Process Tab will held identify at which stage the error occurs and provide specific logs for troubleshooting.

Outputs

The Output Mapping Interface
 
The settings of Outputs are similar to that of Sources but are aimed at where processed data should be published to with settings varying depending on whether the output is to a file, database table, or virtual view.
 
Additional business logic may be applied to data during the Output stage, occurring in the Mapping tab. Here specific columns can be chosen and additional customization can occur such as aggregating data. Multiple sources can be utilized in the output mapping to build a One Big Table output.
 

Connections

Connections are the means in which data is moved into and out of DataForge. Connection settings determine where data is ingested from or published out to. Sources and Outputs rely on connections to dictate where the data is read from or written to.
 

Connections

Connections Settings
 
Connections contain the necessary information for locations where data will be ingested or output. Depending on the settings chosen, the connection parameters required for a successful connection will vary. These parameters could be the file path (inclusive of ADLS/S3), credentials to access a database, or additional information to support a DataForge Agent.
 

Other Objects

Templates

Templates provide users the ability to mass apply and manage similar transformation logic to many objects at the same time, whether it be optional tokens, rule or relation templates, or even source and output templates. 

Schedules

Users manage source data ingestions based on schedule configurations defined and attached to sources.  Schedules are set up using CRON syntax and include a tab for users to see all objects attached to the schedule in one view.

Agents

The Agents page allows users to manage remote Agent installations and parameters related to how remote Agents ingest appropriate data. Additional data on DataForge Agents can be found in the Agent Installation Guide.

Lineage

Lineage icons exist throughout the DataForge user interface, allowing users to create lineage graphs for specific objects or more globally. As the user starts a new lineage graph, they're redirected to this page for interaction.

System Configuration

Users can utilize system configurations for customized Cleanup configurations, Cluster and Process configurations, and to manage global service configurations like disabling all ingestions in the workspace or auto-upgrading to the latest version of DataForge.  Cleanup and Cluster/Process Configurations are assigned to specific sources, giving the user ultimate flexibility in compute resource management and post-processing cleanup.

Projects

Projects are a top-level container within a DataForge workspace and represent separate sets of Source and Output configurations.  Users utilize Project exports and imports to promote changes between projects.  Projects are used to follow DevOps best practices with tools such as Git clients.

Users

User management is handled in this page for all Standard DataForge customers.  Each customer has at least one admin user who can add and manage access for their organization.  Private Enterprise customers manage user access to DataForge through Auth0. 

Documentation (and Support)

The Documentation link in the main left-hand menu brings you to Help Center website you are currently visiting.  From the Help Center, users can read up on product documentation, open a support request for product support issues or questions, or contribute to the community forums.

Databricks

For convenience, users have a direct link to the Databricks workspace the DataForge workspace is tied to.  This make it easy to switch back and forth, both tools complimenting each other based on the task at hand.

Sign Up and Log In (with a subscription)

When you first encounter the DataForge environment after creating a subscription and new workspace, you will be prompted to log in or sign up.  
DataForge login screen
 
Only Private Enterprise will see the sign up option as it is part of a flow with Auth0 user management. To sign up, the user will be prompted to enter a username and a password. The admin of the DataForge environment would need to white list emails of the customer domain and any optional additional domains. If the email is appropriately whitelisted, then a confirmation email will be sent to the email used in sign up and next steps will be prompted for the user to login.

Updated

Was this article helpful?

0 out of 0 found this helpful