Processing Units Introduction
Components of Calculation
-
Flat base weight of each process, with different base weights by process type
-
Added flat weight for Refresh and Output process types by Source Refresh Type
-
Added calculated complexity weight for processes leveraging DataForge Rules or Output Mappings
-
Logarithmic volume weight for Change Data Capture and Refresh process Types
PUs are only charged for successful processes. Failed processes have zero PU cost.
Controlling PU consumption
-
Increase the number of runs/refreshes the systems processes monthly
-
Add more Sources, Rules, Outputs, and Output Mappings
-
Use primarily Key Refresh Type
-
Switch Rules from Snapshot to Keep Current
-
Pulling all data from source systems rather than incremental
-
Moving tables which change slowly to a less-frequent refresh cadence
-
Leveraging Full, None, Sequence, or Timestamp Refresh Type for small volume or tables which can support these alternative Refresh Types
-
Leveraging Hard Dependencies rather than Keep Current to avoid Recalculation processes
-
Use dynamically injectable tokens in the Source Query configuration to only pull incremental data, rather than the full table for each Input
-
For deployed/finalized Sources, disable optional processes such as Data Profiling which may not be used actively outside of initial development workflows
Base Weight
Process Name | Weight |
capture_data_changes | 2 |
manual_reset_all_ capture_data_changes | 2 |
manual_reset_all_ processing_from_cdc | 20 |
manual_reset_capture_ data_changes | 2 |
custom_ingestion | 5 |
custom_parse | 5 |
custom_post_output | 5 |
manual_reset_custom_ parse | 5 |
input_delete | 3 |
enrichment | 1 |
manual_reset_all_ enrichment | 1 |
manual_reset_ enrichment | 1 |
import | 10 |
ingestion | 1 |
loopback_ingestion | 1 |
sparky_ingestion | 1 |
cleanup | 0.5 |
meta_monitor_refresh | 0.5 |
manual_reset_all_output | 1 |
manual_reset_output | 1 |
output | 1 |
manual_reset_parse | 2 |
manual_reset_sparky_ parse | 2 |
parse | 2 |
sparky_parse | 2 |
data_profile | 1 |
attribute_recalculation | 1 |
manual_attribute_ recalculation | 1 |
refresh | 1 |
Refresh Type Weight
Refresh Type | Weight |
Key | 1 |
Timestamp
|
0.5 |
Sequence | 0.5 |
Full | 0.2 |
None
|
0.1 |
Rules Weight
- Rule with compiled length <= 250 characters = +0.03 weight
- Rule with compiled length > 250 characters = +0.08 weight
- See compiled expression under meta.enrichment -> expression_parsed
- Compiled expressions are used to normalize against any non-primary relation traversal syntax differences and source name lengths.
- Compile expressions ensure users are not punished for using descriptive object names or long-form syntax for their business logic
- Expressions that include an aggregate function over a MANY relation traversal = +0.05 weight
- Expressions that include a window function = +0.05 weight
Output Mapping Weight
-
Base Mapping Weight(i.e. [This].mycolumn) = +0.01 weight
-
Mappings including a traversal through a relation = +0.03 weight
-
Aggregate Function Mappings = +0.05 weight
Input and Hub Table Volume Weight

Data Volume
|
Weight |
1 KB
|
0.04
|
1 MB
|
0.32
|
10 MB
|
0.64
|
100 MB
|
1.28
|
1 GB
|
2.56 |
10 GB
|
5.12
|
As you can see, the weight for the PU calculation doubles for each order of magnitude of data.
Single Day Duplicate Source-Process Discount
In some scenarios, data must be refreshed multiple times per day. Additionally, with complex or circular Keep Current rules, a single new Input can cause a cascade of refreshes to be generated to guarantee data accuracy.
To account for these common scenarios, a count of all Refresh and Attribute Recalculation processes are summarized per source per day then used in discount formula to reduce all PUs for those sources, respectively.
Example:
Source Name | Process Type | Standard PU |
SourceA | ingest | 1 |
SourceA |
capture_data_changes | 2.5 |
SourceA | enrichment | 3.5 |
SourceA | refresh | 2 |
SourceA | output | 4 |
SourceA | attribute_recalculation | 2 |
SourceA | output | 4 |
Common Scenarios:
- There is no discount applied for Sources refreshed less than once per day
- Refreshing a source 2 times per day is ~1.5x once per day
- ~25% discounted
- Refreshing a source 10 times per day (once per hour during business hours) is ~5x once per day
- 50% discounted
- Refreshing a source 24 times per day is ~10x once per day
- ~58% discounted
- Refreshing a source 96 times per day (once every 15 minutes) is ~32x once per day
- ~66% discounted
This discount structure was introduced to provide significant discounts up to 10 times per day refresh cadences, with smaller additional discounts beyond.
Refresh cadences above 10 times per day are supported but often require substantial additional operational overhead, resulting in additional product support and platform complexity needs.
NOTE: These discounts are not calculated as part of the process_history.ipu_usage field. Clients seeking to analyze usage by querying this field may need to account for this discount manually.
Summary
Updated