Alloy Refinement Process
Ore
Unrefined data captured exactly as delivered from the source.
Ingestion is the first step in the refinement process where data is copied from external data sources and ingested into the DataForge framework, ultimately creating a new table in Databricks or Snowflake. As DataForge ingests data for the refinement process, it identifies and stores information such as source database, original file location, and source schema within Ember, resulting in a self-managing data lake.
As file data can vary in format and nature, the Parse step is run after Ingestion for files to standardize/normalize the format of all data (think nomenclature and data types) before further refinement.
Both of these steps are facilitated by the Connections and Sources that you define.
Mineral
Purpose-built change detection isolates only the new or updated records.
The Mineral stage of the refinement process consists of DataForge detecting new or changed records within the data that has been ingested compared to data that is already refined. This Capture Data Changes process is driven based on the settings you optionally change within your Source, such as the refresh type.
Alloy
Business logic, joins, and enrichment are applied only to the incremental batch - improving both performance and predictability.
With a clean, well-formatted table listing all the records that need updating, it is time to enrich your data with Alloy through data quality checks and additional business logic. User-specified rules are created and managed for each dataset through the Source Relations and Rules interfaces to drive transformations and processing during an Enrichment step. DataForge provides a flexible, yet guided framework for data management, to assist you in creating always-valid relations between sources which allow you to transform your data using any combination of references across your datasets without concern for data duplication and bad joining of data.
The Alloy process runs all of your business logic and rule transformations, appending new columns to the existing table containing only the records that need updating.
Ingot
The enriched batch is merged into the full dataset through a consistent refinement process that ensures clean, canonical results.
Product
Final data outputs are materialized for analytics, operational systems, and downstream consumers.
The Product process is a layer where you can optionally combine and publish your refined data to external locations for consumption downstream. After data is fully refined in your source hub table, the Output step runs to publish your data externally to one or more destinations you choose, such as different tables or materialized views, files, or events, etc.
The Product phase allows you to use the pre-existing relation logic to combine and stack transformed data sets across your Sources to a single location. This phase of the process typically consists of very limited transformation logic, instead focusing on mappings of data sets and data fields from the source hub tables to the final location(s).
Ember Metadata
Updated