Output Mapping controls the way data is sent to its final destination. It allows a user to rename columns, append multiple Sources, and filter data.
Key Terms
- Columns: Define the final schema of the Output data, such as the target file headers or table column names and types.
- Channel: Union between a source and the output, that defines how attributes from the source flow into the output.
- Mapping: Expression that ties a channel to a column.
- Output Mapping: Collection of all mappings that collectively defines how the data moves from all channels into the output table.
Changes made to an output mapping will not take effect unless the user presses the Save button at the bottom right hand corner of the output mapping screen.
Adding a Channel
Settings:
- Filter Expression: Allows users to define a condition that will exclude rows of data from the output, for which the filter expression does not evaluate to true.
- Include Rows: Dictates whether rows from the source channel hub tables should be included based on Validation rule results. Options are Passed, Warned, and Failed.
- Operation Type: Allows the user to mark a channel as an aggregate. Default is "N/A". More information on aggregate channels can be found below.
- Name: The name of the Channel defaults to name of the source itself. The user may want to set their own name for the channel, for instance to help distinguish between two Channels that come from the same source.
- Description: Allows the user to briefly describe to other users the use case of the Channel.
- Active: Dictates whether the channel is active or inactive on the output. Only Active channels are processed.
Parameters:
- Custom Parameters: JSON formatted string with custom parameter key:value pairs. This field is only useful when applying a Custom Post-Output notebook.
- Full Output Flag: Default is false. Set to true to run a full output of all input data from the source every time the Output process runs.
- Key History: Default is false. Output key history for channels of Key sources - ignore for Time Series sources.
- Allow Output Regeneration: Default is true. If set to false, the output will not be generated if triggered by an output reset or validation reset
Channel Menu Controls
- View/Edit Details: Opens the configuration details modal of the selected channel.
- Automap: Maps attributes from the source of the selected channel to any columns that the user created that have column names that match an attribute name.
- Add All Source Columns: Adds a column for each attribute of the source from the selected channel, and maps the corresponding attribute to the column automatically. The newly created column names will be that of their corresponding attributes.
- Expand All Structs: Expands any struct column mappings to create new columns for every key in the struct schema for the next level down. Use recursively to expand multiple levels of structs.
- Clear Mappings: Clears all expressions currently mapped to columns from the source of the selected channel.
- Remove: Removes the channel in question from the output.
- Lineage: Opens a lineage graph of the channel source to view origin or destination lineage.
- Reset Channel Output: Resets Output for the selected channel for all source data (all input data).
- Reset Source: Provides source level reset options for easy processing. Note: These options reset all inputs on the source.
Note: It is best practice to manually add all Output Columns when configuring an enterprise grade system to adhere to destination naming convention and semantics.
Adding Columns
Column Data Types
The Column Data Types listed in this drop down are the target system's available data types
Column Menu Controls
- View/Edit Details: Opens the column configuration modal of the selected column.
- Move Left: Shifts the column one place over to the left on the UI.
- Move Right: Shifts the column one place over to the right on the UI.
- Automap: Maps an expression to each channel in the output mapping that has a source attribute matching whose name matches the column name.
- Clear Mappings: Clears all expressions mapped to the column across all channels in the output mapping.
- Hide: Takes the column off the screen in the UI, without actually removing the column from the output.
- Remove: Removes column from the output mapping.
- Lineage: Opens a lineage graph of the output column to view origin lineage.
Mapping Expressions
Reset Processes For All Channels
There may be times when column mappings are changed for multiple channels and many channels need to rerun output processes. Other times, rule expressions may have changed for columns that are mapped in output channels.
In both cases, an option exists in the Output Mapping screen to allow for quickly resetting Output or Recalculating (and subsequently Output) on all channels in a source.
Click the Reset button near the top of the mapping screen and choose whether to recalculate or reset output for all channels. This will trigger reset processes for all channels to run immediately.
Aggregate Channels
- GROUPS: Static fields by which rows with matching values will be grouped together. Akin to columns in a SQL "GROUP BY" clause.
- MEASURES: Usually numeric fields that we will perform aggregate operations on.
For more information on what each aggregate function does, please refer to Spark SQL documentation: https://spark.apache.org/docs/latest/api/sql/
Aggregate Channel with Group Example
Goal:
Final Mapping:
Steps:
- After adding the channel and "revenue" and "territory_id" columns, we select the channel settings by selecting the triple-dot menu next to the channel name and select View/Edit Details.
- Change the Operation type to be Aggregate which tells DataForge that we are going to utilize an output aggregation. Click Save in the bottom right of the dialogue box.
- Open the revenue column mapping and map in the correct field to the column. In this example, it is linetotal.
- Select the aggregate function of sum and save the mapping.
- Open the territory_id column mapping and map in the correct field to the column. In this example, it is the territoryid field found in the [AdventureWorks - Sales.Customer] source.
- Leave the aggregate function blank to mark it as a group
End Result:
This mapping results in a view that gives the sum of revenue that is grouped by territory_id.
If other columns needed to be aggregated, we would follow the same steps (3 and 4) to specify an aggregation function.
If other columns needed to be grouped, we would follow the same steps (5 and 6) to map the columns and leave the aggregation functions blank.
There can be multiple groups and/or multiple measures on a single channel mapping.
Updated