Rules

Rules allow DataForge to modify and transform data.

Rules Tab

The Rules tab allows users to select, edit, remove, or add a Source's Rules. By default, only Active Rules are listed. The Active Only toggle changes this setting.

Active rule toggle switch

To create a new Rule, click the New Rule button. This opens the create/edit Rule modal.

New Rule Button

To edit an existing rule, click on any column of that rule's row in the table besides the expression column (circled in the picture below). If a rule references another rule in its expression, a link will appear in the expression column that, when clicked, will open the edit rule modal of the rule that is referenced by the link (boxed in the picture below).

All columns besides the expression column in the Rule Table can be clicked to open the edit Rule modal.

Enrichments vs Validations vs Sub Sources

In DataForge, there are three distinct types of Rules: Enrichments, Validations, and Sub Sources.

An Enrichment An Enrichment is a rule type that defines an individual SQL expression resulting in a new column of any supported data type.

A Validation is a specific type of Enrichment focused on data quality rules that only return Boolean data type columns. A validation verifies that a given column value (or group of column values) for each row adheres to a specified rule defined in the validation expression. All validations of a source are aggregated into s_validation_status_code system attribute with Pass/Fail/Warn values for each column.

A Sub-Source is a specific type of rule that creates a child source underneath the parent source to help transform complex nested array data (array of struct). When a sub-source rule is created pointing to an array of structs, a new source object is created that is tied to the rule. A sub source acts as it's own source for relations and rules where users can build logic within the array of structs with an implicit relation to the parent source provided. All processing is done at the parent Source level and applied to the sub-source.

Rule Parameters

Create Rule Modal

Fields Available:

Parameter	Default Value	Description
Type	Enrichment	The type of the Rule. Validations mark records as pass/fail based on a boolean expression in the expression field. Sub-source create new sources tied to the rule for array of struct types.
Rule Name*	No Default	The user-defined name of the Rule.
Attribute Name*	No Default	The name of the new column of the Rule. This is what will appear in attribute dropdowns of expressions and within expressions. When creating a new rule, this field will automatically be filled in with valid formatting based on what the user enters into the Rule Name field.
Description	No Default	The user-defined description of the Rule
Expression Data Type	Interpreted from expression	The data type of the result of the Expression, automatically inferred by DataForge. This field is not controlled by the user.
Metadata Icon (struct type only)	Interpreted from expression result	Struct data rules display a metadata icon next to the expression data type. Opening the metadata icon will show the struct schema fields.
Attribute Data Type	Expression Data Type	The data type of the Enriched Attribute. DataForge will attempt to convert the data type of the Expression Data Type to the Attribute Data Type. Leave as Default for no conversion.
Recalculation Mode	Snapshot	Snapshot means that the field will never be recalculated for data inputs, even after attributes the rule depends on get updated, unless a user resets enrichment or uses Recalculate options for prior inputs. Keep current means that if an attribute value a record depends on gets updated, the dependent attribute will also be updated during processing.
When expression is false, set to	Warn	These are the flags that will be set on records that fail to be converted to another data type. Warn, Fail, or Ignore are the possible options. For Validations only.
Expression	No Default	Use DataForge QL/SQL syntax to set the Rule transformation logic.
Active	TRUE	Allows the user to set this Rule as Active or not. If active, it will be applied on new inputs and source resets.

Click Save to save the Rule. Clicking Save and Create Validation will create an extra Validation column to mark whether the values from the Expression Data Type succeeded the conversion to the specified Attribute Data Type.

Rule Expressions and Syntax

Rule expressions are made up of a mix of SQL, and DataForge QL. Specifically, DataForge QL is used to access source data attributes, which will be used within a SQL expression the way column names would be used in a normal SQL context.

For a full list of SQL functions available for use in rule expressions, see the following documentation:

Databricks

Snowflake

To begin entering a rule expression, either:

1) Enter an open bracket "[" to reveal a drop-down of sources with a relation chain to the current source and [This] Source. This allows you to select attributes and rules from the current source or related sources.

2) Enter a tick mark " ` " to reveal a drop-down of SQL functions. Then the user can continue filling out the expression however they would like, as long as the attributes are accessed using DataForge QL, and the expression as a whole follows SQL syntax.

See Example Expressions for visual examples of different types of rules to get started.

Complex Data Types

Struct and array data types can be used in rules and templates. Structs can be navigated within a rule using dot notation, such as struct.key. Arrays can be navigated by using the array index (zero based), such as array[0]. DataForge recommends using sub-source rules where columns contain an array of structs.

The expression editor intelli-sense will help navigate nested complex types by listing child keys for each nested level. For working with complex nested arrays, DataForge recommends using higher-order functions. See Snowflake or Databricks documentation for more information.

complex data types release.gif

Using Relations in Rule Expressions

Relations can be traversed in a rule expression to write rules that reference attributes and rules from other sources. When a relation is used in the rule expression, Expression Parameters will populate and provide users visibility into the relation being used with options to change the relation chain direction.

As an example, a rule needs to be written from Source "Customer Data" to pull an attribute from Source "Sales Data" to calculate total sales per customer by traversing a relation between the two sources.

Before the rule can be written, the relation must exist between the two sources. In this case, the relation is named "FK_SalesOrderHeader_Customer_CustomerID" and the cardinality is 1:M from Customer to Sales.

To start, the rule is written to reference the "totaldue" attribute from the Sales Data Source.

Since there is a relation used, the Expression Parameters populates with the [Sales Data].totaldue attribute. However, DataForge recognizes that this relation is 1:M and the rule expression needs to include an aggregate function. To make the rule valid, an aggregate of SUM() is wrapped around the expression.

The relation path can also be modified when there is more than one relation path available between the two sources. Modifying the relation path provides a search option to quickly find specific sources or relations to use. Once the user selects the destination source, DataForge will pick the best relation path available and display it in the Expression Parameters. Users can change any part of the path using the presented drop-downs. Where applicable, additional hops in the relation chain can be added with an "Add Next" button to expand the traversal. Relation paths displayed in Expression Parameters have intuitive labels formatted as [From Source Name]->relation name->[To Source Name].

After changing the relation chain using the drop-down, DataForge automatically adds any additional relations needed to complete the relation chain. If there is no drop-down available, the relation is the only available option to complete the path.

Hovering over relation chains in the Expression Parameters will show a tooltip of the relation expression for easy understanding while writing the rule.

Clicking the parameter name in the Expression Parameters will highlight where the parameter is used in the Rule Expression. This is especially helpful for longer or more complex rules where multiple parameters are referenced like the below example.

Example Expressions

Multiplying an attribute from a related source to an attribute of the current source through a primary relation

[This].ListPrice * [Related Source Name].Quantity

Aggregating a column from a primary related source with many cardinality

sum([Related Source Name].SalesTotal)

Aggregating a column from a non-primary related source with many cardinality

sum([Related Source Name].SalesTotal)

Rounding a column down to two decimal places

round([This].AverageCost, 2)

Window function aggregating one attribute of the current source while partitioning on another

sum([This].TotalCost) OVER (PARTITION BY [This].CustomerID)

Validation rule to determine if a numeric column of the source is above a specified threshold

[This].TotalCost > 100

Returning key from struct field using dot notation

[This].struct.key

Returning first index field from an array

[This].array[0]

When will my rule get processed?

Enrichment

All traversal rules, regardless of keep current
Rules that do not contain window functions and do not depend on a window function rule, regardless of keep current

Refresh

All window function rules (window function rules must be keep current)
All rules that depend on window functions (rules that depend on window functions must be keep current)

Attribute Recalculation

All keep current traversal rules WHEN the related source gets new data
All rules that depend on the above keep current traversal rules (rules dependent on keep current rules must be keep current)

Data Profiles

Clicking the Data Profile icon brings up the data profile of that raw attribute. Different datatypes provide different data profile data. Older data profiles for the source can be accessed by using Select profiling timestamp.

Data Profile options

A modal appears showing the data profile when the datatype label is clicked.

Sub-Source Rules

Rules can be added to a sub-source similar to a regular source with the following limitations:

No validation rules allowed within the sub-source
No unique flag available for sub-source rule(s)

For full documentation, visit Sub-Sources.

Updated January 12, 2026 17:31

Was this article helpful?

0 out of 0 found this helpful