Overview
The sub-source feature in DataForge enables developers to easily work with and transform complex nested arrays (Array of Struct). Outside of DataForge, these complex nested arrays are difficult to work with and require a developer to manage multiple tables to normalize the data through flattening, exploding records, and generating join keys.
DataForge sub-sources allow developers to create relations and rules directly within the complex type data without manipulating data structures which saves time, effort, and complexity.
Consider the following Order data format as an example where a sub-source is the right solution.
The “order_detail” key contains an Array of JSON values, with each element representing details about the ordered product. This structure logically represents a nested order_detail table that is physically stored in the order.order_detail column. Array elements represent rows, and Struct keys (qty, price, …) represent columns of the nested table. Order and order_detail tables are logically related via a 1:M relation. This logical-only relation is implicit by the order_detail data contained in the parent order record. Physically this data format is considered denormalized.
From this data, a developer can create an Order source with a sub-source rule pointing to the order_detail array<struct<>> raw attribute. Within this sub-source, the developer can then build relations and rules directly within each order_detail struct.
Creating a new sub-source
To create a new sub-source, create a rule in the parent source that points to another attribute that is formatted as ARRAY<STRUCT<...>> data type.
The Type option should be set to sub-source to utilize the sub-source functionality.
After the rule is saved, a new sub-source is created and appears in the Sources page with nomenclature of "Source name-Rule name" which can be modified creation.
Working with the sub-source
Sub-sources are listed as a source in the main Sources page. To transform the data, create relations and rules within the sub-source. However, the sub-source processing actually occurs within the parent source. If new rules are added to the sub-source, recalculate the parent source to see the new rules take affect.
Within each sub-source, the same tabs will appear as a regular source. However, some of the tabs are not available for use as they are controlled within the parent source and inherited. See below for a list of tab features available.
Settings
The settings tab is available in sub-sources with the majority of settings inherited from the parent source. The sub-source can be renamed or given a new description in the settings tab.
Raw Schema
The raw schema tab is available in sub-sources. Raw schema is auto-updated from parent sub-source rule schema. Each raw attribute represents a top level key of the source attribute struct.
Dependencies
The dependencies tab is not available in sub-sources. The sub-source inherits the dependencies of its parent source as the parent is calculated.
Relations
The relations tab is available in sub-sources. Each sub-source has an implicit relation with M:1 cardinality to its parent source. Additional relations can be added to the sub-source similar to working with relations in a regular source.
Rules
The rules tab is available in sub-sources. Rule results are included inside each struct within the array of the parent sub-source rule. Rules can be added to a sub-source similar to a regular source with the following limitations:
- No validation rules allowed within the sub-source
- No unique flag available for sub-source rule(s)
Inputs
The inputs tab is not available in sub-sources. All inputs are managed within the parent source where the sub-source rule is calculated.
Processes
The process tab is not available in sub-sources. All processes are managed within the parent source where the sub-source rule is calculated.
Data View
The data view tab is not available in sub-sources. To view data in a sub-source, query the parent source to view the sub-source rule results.
Processing and recalculating the sub-source
All processing for sub-sources happens within the parent source. The sub-source is a representation of the same nested complex array in the original parent source rule and is processed whenever the parent source is processed.
Outputs with sub-sources
Sub-sources can be mapped directly as a channel to any output. Sub-source outputs support all regular output functionality. Since sub-sources don't have inputs or processes, the output process is executed in the context of the parent source. When run, the output process parameters will contain the following structure:
- source_id: points to parent source of the sub-source
- output_id: points to the output the sub-source is mapped to
- output_source_id: points to the channel that contains the sub-source mapping
- input_id: points to the input of the parent source
Process visibility
Sub-source output (and manual reset output) processes are visible in the process tab of the parent source, process tab of the output the sub-source is mapped to, and the global processing page.
Output table refresh / deletes
Sub-source output channels inherit the refresh type/mechanism of their parent source. For example, if the parent source is set to Key refresh, the sub-source process will generate the output delete query as it would for the parent with an s_key delete algorithm. If the parent source is set to Full refresh, the sub-source process generates an output delete query that fully replaces the channel data set. The tracking system output fields of the sub-source output are always referencing the parent.
Updated