Output Mapping

Output Mapping controls the way data is sent to its final destination. It allows a user to rename columns, append multiple Sources, and filter data.


Key Terms

  • Columns: Define the final schema of the Output data, such as the target file headers or table column names and types.
  • Channel: Union between a source and the output, that defines how attributes from the source flow into the output.
  • Mapping: Expression that ties a channel to a column.
  • Output Mapping: Collection of all mappings that collectively defines how the data moves from all channels into the output table.

Changes made to an output mapping will not take effect unless the user presses the Save button at the bottom right hand corner of the output mapping screen.


Adding a Channel

To enable mappings between an output and a source, the first step is to add the channel to the Output Mapping screen, thus generating a link between this Source and Output. Multiple channels can be added to the same output to build a One Big Table model output. DataForge will automap all mappings possible when a new channel is added to an output which has existing outputs and mappings. Automap will populate if the new channel has a direct or relation path to the same field as already mapped channels.
Click Add Source Mapping at the top of the mapping table and underneath the Mapping tab, as seen below.
 

 
This will bring up the Channel configuration modal. To select the source to map, click on the Select Source search bar/drop down menu circled below, begin typing the name of the source that needs to be mapped, and once the desired source appears in the dropdown menu, it can be selected.
 

 

Settings:

  • Filter Expression: Allows users to define a condition that will exclude rows of data from the output, for which the filter expression does not evaluate to true.
  • Include Rows: Dictates whether rows from the source channel hub tables should be included based on Validation rule results. Options are Passed, Warned, and Failed.
  • Operation Type: Allows the user to mark a channel as an aggregate. Default is "N/A". More information on aggregate channels can be found below.
  • Name: The name of the Channel defaults to name of the source itself. The user may want to set their own name for the channel, for instance to help distinguish between two Channels that come from the same source.
  • Description: Allows the user to briefly describe to other users the use case of the Channel.
  • Active: Dictates whether the channel is active or inactive on the output. Only Active channels are processed.

Parameters:

  • Custom Parameters: JSON formatted string with custom parameter key:value pairs. This field is only useful when applying a Custom Post-Output notebook.
  • Full Output Flag: Default is false. Set to true to run a full output of all input data from the source every time the Output process runs.
  • Key History: Default is false. Output key history for channels of Key sources - ignore for Time Series sources.
  • Allow Output Regeneration: Default is true. If set to false, the output will not be generated if triggered by an output reset or validation reset

Channel Menu Controls

To open the Channel control menu, click on the triple-dot menu (⋮) at the far right of the Source/Mapping Name column of the output mapping.
 
Channel Menu can be seen on the right
  • View/Edit Details: Opens the configuration details modal of the selected channel.
  • Automap: Maps attributes from the source of the selected channel to any columns that the user created that have column names that match an attribute name.
  • Add All Source Columns: Adds a column for each attribute of the source from the selected channel, and maps the corresponding attribute to the column automatically. The newly created column names will be that of their corresponding attributes.
  • Expand All Structs: Expands any struct column mappings to create new columns for every key in the struct schema for the next level down. Use recursively to expand multiple levels of structs.
  • Clear Mappings: Clears all expressions currently mapped to columns from the source of the selected channel.
  • Remove: Removes the channel in question from the output.
  • Lineage: Opens a lineage graph of the channel source to view origin or destination lineage.
  • Reset Channel Output: Resets Output for the selected channel for all source data (all input data).
  • Reset Source: Provides source level reset options for easy processing.  Note: These options reset all inputs on the source.

Note: It is best practice to manually add all Output Columns when configuring an enterprise grade system to adhere to destination naming convention and semantics.


Adding Columns

 
Add Column button
 
To add multiple columns at once, use the channel menu to the right of the channel name and select the Add All Source Columns option. This will create a column for every field in the source, including raw attributes and rules.
 
 
To add a single column, click the Add Column button in the top middle of the screen, seen next to the Active toggle in the image above.
 
When the create column modal opens (seen in the image below), a column name must be added. The column name should start with a letter and may contain only letters, numbers, and underscores.
 
 
Optionally, the user can add a description to the column, or explicitly set the datatype of the column. If no datatype is set, the datatype of the column will be automatically inferred based on what source attributes are mapped to the column.

Column Data Types

The Column Data Types listed in this drop down are the target system's available data types

Column Menu Controls

To open the column menu, click on the list icon button (circled in the image below) at the far right of each column header. None of these actions take effect unless the output mapping save button is pressed afterwards.
 
 
  • View/Edit Details: Opens the column configuration modal of the selected column.
  • Move Left: Shifts the column one place over to the left on the UI.
  • Move Right: Shifts the column one place over to the right on the UI.
  • Automap: Maps an expression to each channel in the output mapping that has a source attribute matching whose name matches the column name.
  • Clear Mappings: Clears all expressions mapped to the column across all channels in the output mapping.
  • Hide: Takes the column off the screen in the UI, without actually removing the column from the output.
  • Remove: Removes column from the output mapping.
  • Lineage: Opens a lineage graph of the output column to view origin lineage.
 

Mapping Expressions

Once at least one channel and column have been created, the user can start mapping attributes by clicking on the empty cell that lies at the intersection of the target channel and the target column, example below:
 
After clicking the empty cell to map a value to, the expression entry modal will appear:
 
 
Similar to Rule expressions, to begin mapping an attribute, type an opening bracket "[" to reveal a drop down of all sources that have an active relation chain to the source of the target channel and the target source of the channel, represented by [This].
 
 
To map an attribute from the target source of the channel select [This], or to map an attribute from a source that is related to the target source of the channel, choose one of the other sources in the drop down.
 
Once a source has been selected, typing a period "." reveals a drop down of all attributes of the selected source. Choose an attribute from the drop down to map it to the target field.
 
 
Mappings in struct columns will include additional options such as dot notation for identifying keys within structs to map to the column and an Expand button to expand the Struct into multiple columns, one column for each key in the struct.
 
 

Reset Processes For All Channels

There may be times when column mappings are changed for multiple channels and many channels need to rerun output processes. Other times, rule expressions may have changed for columns that are mapped in output channels. 

In both cases, an option exists in the Output Mapping screen to allow for quickly resetting Output or Recalculating (and subsequently Output) on all channels in a source.

Click the Reset button near the top of the mapping screen and choose whether to recalculate or reset output for all channels. This will trigger reset processes for all channels to run immediately.


Aggregate Channels

Aggregate channels allow users to output data at a higher grain than their actual data. A channel is set as aggregate by checking the Aggregate option of the Operation Type field on the channel details modal. Aggregate channels are denoted on screen by the icon in the picture below:
 
 
All columns of an aggregate channel fall into one of two categories:
  • GROUPS: Static fields by which rows with matching values will be grouped together. Akin to columns in a SQL "GROUP BY" clause.
  • MEASURES: Usually numeric fields that we will perform aggregate operations on.
Which category a column falls under is determined by the expression mapped to it for the aggregate channel.
 
 
If an aggregate function is selected, the column automatically becomes a MEASURE column. Conversely, if no aggregate function is selected, the column automatically becomes a GROUP column.

For more information on what each aggregate function does, please refer to Spark SQL documentation: https://spark.apache.org/docs/latest/api/sql/

Furthermore, if an aggregate function for a column has been selected, the user has the option aggregate on only distinct values within the selected aggregate field, by clicking the checkbox shown below:
 

Aggregate Channel with Group Example

Goal:

In this example, we want to set up an output aggregation for Sales Order Details to show the sum of revenue grouped by the territory_id.

Final Mapping:

Steps:

  1. After adding the channel and "revenue" and "territory_id" columns, we select the channel settings by selecting the triple-dot menu next to the channel name and select View/Edit Details.
  2. Change the Operation type to be Aggregate which tells DataForge that we are going to utilize an output aggregation.  Click Save in the bottom right of the dialogue box.
  3. Open the revenue column mapping and map in the correct field to the column. In this example, it is linetotal.
  4. Select the aggregate function of sum and save the mapping.
     
  5. Open the territory_id column mapping and map in the correct field to the column.  In this example, it is the territoryid field found in the [AdventureWorks - Sales.Customer] source.
  6. Leave the aggregate function blank to mark it as a group

End Result:

This mapping results in a view that gives the sum of revenue that is grouped by territory_id.

mceclip2.png

If other columns needed to be aggregated, we would follow the same steps (3 and 4) to specify an aggregation function.

If other columns needed to be grouped, we would follow the same steps (5 and 6) to map the columns and leave the aggregation functions blank. 

There can be multiple groups and/or multiple measures on a single channel mapping.

Updated

Was this article helpful?

0 out of 0 found this helpful