Inputs

The Source Inputs screen shows the status of an individual Source's processing and allows users to restart processing of input(s) from a specific processing step.


An "Input" is DataForge's atomic unit of data processing. Conceptually, an Input corresponds to a single file or scheduled Table pull from a configured Source.


Source Inputs Tab

The Source Inputs screen allows users to monitor the status of their Sources' Inputs. The Inputs tab provides insight into the processing of all stages for a given Source. The top of the page has a variety of filters, which allow filtering based on the status of all four processing stages, as well as the file path name. The fields displayed are as follows:
 

Columns and statuses

  • Id - The input ID. Used to indicate this input in logs, processes, and the dependnecy queue
  • File Name - The source file name, not present for table sources
  • Received DateTime - The timestamp at which the input was pulled into DataForge
  • Size - The total file size of the source file/database pull
  • Record Count - The number of records appearing in the original input file/database pull
  • Effective Record Count - The number of records appearing in the hub table after CDC.
  • Status - Indicates whether an input has successfully gone through all of its processing steps. Click this status to navigate to the Process page filtered for that input for further visibility of the process details and job run or to cancel an active process.
     
    Success - Everything has processed correctly
     
    Fail - A failure has occurred for this input
     
    Waiting - The input is waiting in the dependency queue
     
    mceclip1.png Ingestion Queued - The input is waiting for the Agent to process.  This can be an indicator the agent is dead or not responding and needs to be restarted.
     
    In Progress - The input is currently running a process
     
    Launching Cluster - The input is launching a new cluster
     
    mceclip2.png The input is queued for deletion
     
    mceclip3.png Queued in Workflow - Input is waiting for Workflow to release process to continue.  Click icon to see queue details.
     
    mceclip0.png - The input has passed processing but contains 0 records.
     
  • Current Process Type - Displays the process currently running for each input
  • Last Completed Process Type - Displays the last completed or attempted process type for each input
  • Checkbox - Used to select multiple inputs for deletion.  After selecting the input checkboxes you want to delete, use the Select Action drop-down at the top and select Delete. Then select Submit.
 
The Inputs Page

Three Dot Menu for an Individual Input

Contains data processing and reprocessing options. Kicking off any of the reprocess options will lead to all downstream processes running as well. i.e. Reset Capture Data Changes will perform enrichment, refresh, and output after completing. If one of these options is greyed out, hover over the value to find out why it is not currently a valid choice.
  • Reset Parsing
    • Only present for file sources
    • Rereads data from the source file
    • Use this when a file is not read into DataForge correctly after adjusting the parsing parameters
  • Reset Change Data Capture (CDC)
    • Recalculates all CDC values and rewrites CDC files for a specific input
    • Use this when changing the CDC tracking fields or source refresh type
  • Reset Enrichment
    • Regenerate enrichment query and run it to rewrite enrichment file for a specific input
    • User this to test out newly created enrichments.
  • Reset Output
    • Regenerate output query and output delete query for a specific input and run it.
    • Use this to repopulate outputs with newly mapped values
  • Delete
    • Delete this input from DataForge and the hub table.
    • This process type can cause other inputs to process in order to fill in data gaps.
    • Use this to get rid of unwanted data
  • View Data
    • Use this to easily navigate to the Data View tab and view the data relevant for the input selected
  • View Raw Schema
    • Use this to easily navigate to the Raw Schema tab and view the raw attributes that were brought in during the Input selected
  •  
 
Example Menu with Invalid Options
 

Controlling All Inputs or New Data Pulls

Users can control all of the Inputs for a Source using the options below when selecting the triple dot menu on the header row above the inputs. Not all options will be available depending on the current state of the Source.
 
Source-wide re-processing can be expensive if incorrectly or unnecessarily used on sources with hundreds or thousands of inputs.
 
  • Pull Data Now: Immediately generate a new Input for this Source (not available on watcher sources)
  • Reset All CDC: Reset the Change Data Capture phase for all inputs.  For sources with a large number of inputs (500+) or extremely large data sizes (100GB+), please refer to this guide on configuring a larger cluster to use for the Reset All CDC process.
  • Reset Output: Reset the Output phase for all inputs for a specific Output or All Outputs this Source is mapped to
  • Recalculate Changed: Recalculate new rules and changed rule expressions for all inputs.
  • Recalculate All: Recalculate all rules for all inputs.
  • Reset All Parsing: Reset the Parsing phase for all inputs
  • Delete Source Data: Delete all stored data for the Source.  Deletes all inputs (hub table, raw input data, rule results, metadata not used in rules or output mappings)
  • Delete Source Metadata: Delete all metadata for the Source.  Only available after using Delete Source Data option.
  • View Source Data: Opens data view tab to show the data for this source
 
Options for all inputs

 

 


Sub-Source Inputs

The Inputs tab is not available in sub-sources. All inputs are managed within the parent source where the sub-source rule is calculated.

For full documentation, visit Sub-Sources

Updated

Was this article helpful?

0 out of 0 found this helpful