DataForge Version maintenance updates (archived)

This archived page lists maintenance updates for DataForge versions that are no longer supported. 

Important!

This documentation has been retired. The DataForge versions listed are no longer supported.  See DataForge release notes and version support.

DataForge Version releases

Maintenance Updates by version:

DataForge Version 8.0

DataForge Version 7.1

DataForge Version 7.0

DataForge Version 6.2

DataForge Version 6.1

DataForge Version 6.0

DataForge Version 8.0

  • September 25, 2024 (8.0.4)
    • Cleanup fails to update meta.process fix
    • Parametrize max relation hops to prevent API crashes
    • Remove spark.sql.decimalOperations.allowPrecisionLoss from default cluster spark configs
    • Ingestion fails when struct has fewer fields than previously
  • September 3, 2024 (8.0.3)
    • Add All Source Columns after columns are added/mapped breaks output positions (writes incorrect data in columns)
    • Cleanup is failing with error process_history_v2
    • Enrichment parser timeout fix
    • Keep current rule using Self-relation with multiple relation hops causes query gen error
    • Add customer billing portal button to accounts page
    • Add "reselect cloud provider" option to signup flow after selecting cloud and before creating workspace
    • Send email when new customer backs out of payment entry
    • Add subscription type to signup UI
    • Add subscription limit tracking & enforcement for Plus subscription edition
  • August 19, 2024 (8.0.2)
    • Rule expression validator fails when expression uses same parameter twice or more
    • Unchecking unique flag on rule does not remove uniqueness rule and has to be deleted in postgres
    • Sparky ingest isn't adding trust server certificate (and maybe more) to connection correctly
    • Prevent endless deployment loop + send notification when container crashes repeatedly
    • Add s3 endpoint to SaaS environments to avoid costs for deployment artifact download
    • Virtual or delta output fail query gen when Timestamp attr is mapped to Date output column (corrected to invalid mapping - not allowed anymore)
    • Removed other invalid output type mapping options. Refer to schema evolution matrix in source settings documentation for valid combinations.
  • August 2, 2024 (8.0.1)
    • Workflow queue put wait on a source for the wrong output
    • Output with Array/Struct column fails all channel outputs that aren't mapped to the array
    • Changing schedule on custom refresh source prompts Reset All CDC required
    • Enrichment failed due to raw attribute "view" is missing in last input
    • Large complex attribute schema breaks data normalization
    • Import process should not validate inactive rules have parameters since we don't save them
    • Blank column shows after adding long source to output
    • Azure saas workspaces do not have default cluster config node type and saving \(default\) throws error with both cloud node types
    • Double type is invalid for sql server output
    • Custom notebook process end Not sending to correct api route in SaaS
    • Output column can't be remapped to decimal from double
    • Update enrichment datatype and datatype\_schema to account for cast\_datatype
    • Add infra permissions so core can update deployment task definition/container image
  • July 3, 2024 (8.0.0)
    • Agent is not applying compression to avro files
    • Having a space in Output column name for file output throwing PARSE_SYNTAX_ERROR error
    • Changing refresh type full to key leaves existing relations and rules with incorrect cardinality
    • Raw attribute normalization does not work properly for s_output_source_id
    • Reset all CDC breaks if run immediately after import of compound key keyed source
    • Switching from scheduled to watcher doesn't clear out record in ingestion queue
    • Relation Convert to Template tooltip with Primary Flag
    • Refresh browser or connection metadata view when switching projects
    • Sensitive connection parameters with ( in the param value can't be extracted and print in notebook runs
    • Fix scroll bar/buttons on Rule editor page
    • Agent logs don't scroll/load past certain number of records
    • Processing page filters spread really far apart on bigger screen
    • Import doesn't restart if core goes through reboot
    • Data View sort column doesn't allow sort descending anymore (click column sort twice)
    • Manual CDC failed with error message- Cannot convert Avro field 'linetotal'
    • Getting group message "No tokens defined for any sources within cloned group" when sources have tokens

    • Rule Template Applying - Updated By and Updated Date are wrong
    • Cleanup causes issues when it runs alongside a virtual output when channel has no data
    • Trim_string_data parameter is ignored in Sparky Ingest
    • Select Action / Remove Linked sources is not visible on rule and relation template linked sources page
    • Importing same project twice showing relation object changed
    • Add scrollbar on Account Management page

    • Opening project import url opens imports from correct project but wrong project name dropdown

    • Connection dropdown doesn't load on source settings page randomly
    • Changing file type is showing parse reset message - should show message that file type can't be switched
    • None.get error massage during connection saving for invalid json
    • Error on ingestion for connection metadata created sources
    • Token list is broken in the test source on the Relation Template page
    • Project import failing on "unable to map relations"
    • Import failed : ERROR: too few arguments for format()

    • Schedule not picked up by Core after cloning sources
    • Complex data types is failing during refresh
    • Rule expression validation fails with AGG(...) + [..].x pattern
    • Remove Recalculate changed
    • The save output mapping button should not be available to click when the source is not selected
    • After import, "next ingestion" should not be populated on all sources page
    • Wrong updated userid on the source setting page

DataForge Version 7.1

  • May 13, 2024
    • After Key bulk reset CDC, history table AND enr files are hyper-partitioned
    • Clone of Group containing unique rule template causes error
    • Unable to clear output channel filter
  • April 9, 2024
    • Email alert sending now supports TLS 1.2
    • Added option for custom "send from", configured through Terraform variable using smtpCustomSendFrom variable or updated in Service Configuration UI page
    • Azure environments can now configure S3 connections and pull files from S3 using Sparky ingestion. Currently only Sparky Ingestion and Scheduled initiation type are supported.
  • March 28, 2024
    • Long aggregate expressions time out in rule & rule template parser
    • Rule template parser gets lost and returns bad relation chain
    • Wrong export format for relation paths
    • Allow data security mode to be saved on cluster configs
  • March 22, 2024
    • Token used in rule template returns token attribute name rather than value of attribute name
    • Processes do not show up when switching projects
    • [ Related sources list not showing any other than This when creating or changing rules
    • Validation template can't be saved "Validation rule expression is not boolean"
    • Inputs page not loading inputs past certain date / # of inputs
  • February 28, 2024
    • Custom SDK saves ingested raw attributes as string datatype
  • December 15, 2023
    • Output process tab doesn't seem to show in-progress outputs - it only showed up when it hit completed
    • Shorten connection test error printed in Agent logs
    • Auto-casting to timestamp in rules does not allow for specifying format and is buggy
    • Output Process Tab Filtering
    • Hitting X to wipe out cluster config job configuration libraries removes \[\] - throws wrapped error on save
    • Usage disable-ingestion flag didn't flip back to false after upgrade wasn't able to start
    • Support Self Relation with Unique Rule in relation expression
    • Support Self Relation with Window Function Rule in relation expression
    • Output Add All Source Columns maps Numeric type to String column type
    • Source status failed not matching input status success
    • Sparky File ingestion ignores connection path suffix
    • Key bulk reset CDC breaks for file source when reading same file with no date parameter specified
    • Clone fails if a non-templated rule references a templated relation
    • Inactive dependencies marked active\_flag=true in project exports
    • Relation Template expression is uneditable until reselecting the source to test
    • Custom ingestion flipping disable ingestion after 3 failed ingestions
    • Custom notebook cluster configs are allowed to be saved as Default Cluster Configuration, which shouldn't be allowed
    • Can't access running process record when I have to scroll down because it auto refreshes
    • Check for incorrect rule/relation parameters, warn, and write clearer error message than query gen
    • Prevent adding multiple columns in the same output with same name
    • Renaming a Group is saving the name locally even though actual saved failed.
      forward slash in template name creates second level folder in project export/import
    • Refresh process spams history.process.record\_counts
    • Project import sometimes doesn't update validation action code on validation rule
    • Update sparky ingestion to match agent ingestion on raw file save path when file extension isn't listed
    • If Driver Node is reclaimed we don't launch retries and things get stuck
    • Attribute Recalc waiting to run on downstream\_lookup\_enrichment and
    • Using date\(\) function in rule expression blows up API
    • Clusters changed between cdc reset and output
    • We allow saving a source as custom ingest with no custom cluster configuration attached
    • After an import, connection value is empty in UI until manual refresh.
    • Prevent period \(.\) for Hub View Schema
    • Add a check for source name modification for a source with group and source template name
    • Import fails when raw attribute or enrichment is removed from source
    • Source process tab date filter functionality forward buttons not working
    • Custom parse ignores force\_case\_insensitive parameter
    • Agent file ingest fails with multiple directories in file mask
    • Misspelling - null values detected message on rule validation
    • Import doesn't validate connection type
    • Remove single quotes around timestamp token for SFDC ingestion
    • Remove single quotes around timestamp token for SFDC ingestion 7.1
    • Processes are running multiple times when cluster is interrupted in AWS
    • Hub table gets dropped and re-created multiple times
    • Templatizing the source breaks relation templates pointing to it.
    • Rule template can be created that references "actual" source name of templatized source
    • Dropdown list of source template names in rule expression editor has duplicates
    • Incorrect type mapping from sql server in sparky ingestion
    • Connection test creates duplicates
    • When API is not available, Agent takes long time to restart
    • Processes get stuck in queued if disable ingestion is enabled at project and sources created from metadata
    • Spaces in token names don't work when using token in attribute name of rule expression
    • Port not added to connection string correctly in Sparky for MySQL \(and others\)
    • Process date filter returning blank dates
    • ENR query fails when JOIN is using unique enrichment attribute

DataForge Version 7.0

  • January 5, 2024
    • Salesforce documentation hyperlinks were not opening correctly to the api docs
    • Salesforce API ingestions with Max Records parameter not downloading all batches of data
  • December 12, 2023
    • Workflow doesn't prevent manual reset cdc on source with previously failed refresh
    • Add connect and login timeout to Core system config connection on startup
    • Remove single quotes around timestamp token for SFDC ingestion
  • November 22, 2023
    • Output mappings not fully importing from project
    • Select enrichment parameters deleted during import
    • Added check for import/export version number
    • Unable to delete group if connection is tagged in use
    • Removed deprecated scripts from database deployment during upgrades
    • Validation of rule template erroring with message "error : Values not defined for token(s): "
  • October 5, 2023
    • Group renaming fails on save
    • Output channel IPUs not updated after cloning
    • Custom refresh source fails during refresh when keep current rule using window function exists
    • Delete input and Bulk reset CDC may duplicate data
    • Clone (output channel clone without Group) does not map cloned Output column mappings
    • Unable to process large inputs (10GB+)
  • September 21, 2023
    • Update Source File Type description to clarify multipart should not be used with Watcher sources
    • Reset All CDC fails on switching from Key to Custom Refresh when schema changed
    • Source parser attribute is null in Postgres but displays as Spark in UI
    • Connection cloud_credentials are blank after cloning
    • s3 file connection allows "Public bucket" credential when using agent (removed)
    • Can't update Raw Attribute Description anymore
    • Convert query token substitution to string for spark ingestion to match agent ingestion
    • Optimize reads for custom refresh bulk reset CDC
  • September 15, 2023
    • Changing from Key refresh to Custom Refresh fails on Reset All CDC
    • Key Delete Breaks when raw_attribute only exists in input(s) being deleted
    • Delta Lake Output with multiple channels creates loop of output processes
    • Key CDC bulk reset breaks hub_history table
    • Custom Data refresh delete input fails
  • August 22, 2023
    • Direct Mappings do not detect Relations

    • Refresh calls effective record ranges function excessively during timeseries,none,sequence refresh

    • Enrichments with Date type failing to save and restarting API
  • July 19, 2023
    • Sparky file ingest is missing file archiving
    • Add the option to delete or remove existing sources from the dependencies page through the UI
    • Keyed source is allowed to keep processing if refresh failed but a queued input is deleted
    • Output records for deleted inputs are not deleted after reset all output
    • Bad character in fixed width schema breaks file ingestion
    • Source and output settings are not reverted after error during save.
    • Deleting Token from Source page throwing error message
    • Manual attribute recalculation process waits on any pre-refresh process in the queue, creating circular wait in case of error
    • Data Profile showing blank results for Rules
    • Cluster Applied Objects page is not showing the source list when the Cluster used as a process override cluster
    • Mark Input as Failed if latest output retry is failed when multiple outputs are mapped
    • Source filters error out with ' in search
    • Output: changing Output Type from Virtual to File is throwing error message
    • Delete group connection if group is deleted from project and doesn't exist in other projects
    • Databricks Null Pointer error on Failure
    • Hub View Name incorrect when creating source from connections page
    • Output mapping non-primary relations not validating
    • Do not allow mapping unmanaged external source to Output -> Source Mapping
    • Do not allow Keep Current rules to be used with unmanaged source in relation/rule expression
    • Schedule page is showing two Name column
    • Cloning a group sets hub_view_name to NULL for all new sources
    • Searching for Agent name doesn't work but agent code does
    • Process Configuration UI Search doesn't work
    • Relation Expression can't save when casting part of expression from String to INT
    • Enrichment parser does not allow writing enrichment using a relation out and into the same source
    • CDC Failed from Driver Node reclaimed but source kept processing newer inputs and didn't queue them
    • s_key using decimal column is calculated differently in regular CDC vs Reset_all_cdc, leading to data duplication
    • Job Runs table is empty 
    • Adding source id to URL jumps to the right source but keeps the wrong project where it doesn't exist
    • Input status is not updated when enrich/recalc or output query fails to generate
    • Deleting source can get you to a null Project for the session
    • Deadlock happening during processing
    • Core failed to reconnect after Postgres shut down/restarted
    • Get Date From File Name broken when using only date with no time component
    • Deployment needs to be rerunnable by restarting deployment task only after failure
    • Deployment container fails to rollback and restart Core, API, Agent, Usage Agent containers on failure
    • Catch error if a valid Cron expression is saved that results in invalid date
    • Reset all CDC hangs
    • Can not clear Generic JDBC password once it's been saved
    • Extra slash in file path archive folder setting on watcher source creates infinite loop of errors
    • A big source name causes logout button and source status to go off the screen
    • CDC and Input Delete break after Raw Attribute with same name as Key Column and different data type are ingested
    • Source settings parameters are not properly bolding or auto-expanding when non-default
    • IPU is hidden in SaaS versions
    • Remove reset options from triple-dot menu for inputs that failed on ingestion
    • Admin user can't uncheck the Active flag on a non-admin user in the UI

DataForge Version 6.2

  • May 30, 2023
    • Add log message to process logs when a watcher source failure triggers disable initiation to flip
    • Input not updated to Success when output runs (process complete success)
    • Keyed source allows later inputs to process when Refresh failed from driver node being shut down (spot availability)
    • Reset Enrichment not an option when needed after failing from other source Hub view does not exist error
    • Agent Restart Fails Ingestions but Retries don't run after the restart
    • Fix deprecated image for Appstream
    • Remove the use of ACLs from S3
    • Increase Source Page load to 100 rows initially and increment in 100 row chunks
  • April 6, 2023
    • Core gets stuck on microservice restart
    • Attribute Recalculation with Self-Relation in Rules on Multiple Sources triggers loop
    • Can't save or update rule template with "null values detected" message
  • March 24, 2023
    • Import breaks due to not converting Output Column Mapping Expressions to lowercase column aliases during lowercasing process
    • Import breaks due to lower-casing of column aliases but not converting Relation Expression column aliases to lowercase
    • Self-related enrichment expression is not parsed correctly and returns wrong result
    • Add user-agent tag to all databricks calls
  • March 9, 2023
    • Fixed attribute lowercasing conversion with additional hub table column name lowercasing.  The bug resulted in new attributes being created during Refresh and null values.  With this fix, a new data pull or reset corrects the hub table.
    • Fixed Custom Post Output process failing by correcting missing parameters in request process.
  • February 21, 2023
    • Cleanup is running slow and not fully utilizing cluster capacity

    • Dependency edit dialog is broken

    • Deadlock during prc_proces_end during update of job_run table

    • Queued Ingestion processes not able to be deleted, automatically failing processes associated with dead agents also not working

    • Attribute recalc overrides hub table view definition of full refresh source with non-current input_id

    • Clone import errors when tokens are used in relation expressions

    • Output template view name does not convert ${GROUP} token dashes to underscores

    • Relation and Rule templates continue spinning when it is invalid rather than giving an error

    • Key Source: Changing the Key Columns values multiple times does not show "Save Changes and Reset CDC" popup option

    • Usage-Agent errors out with Java heap space message

    • Unable to save boolean values in service(system) configuration screen

    • Input last_completed_process_type is not updated when process fails

    • Increase wait time before Core deletes process if agent is unhealthy

    • Private UI container results in Environment and Client parameters as "undefined" in UI

    • Multipart sources throwing error in cleanup

    • Import does not import post output commands option (post output notebook not imported)

    • UI displays 2-3 redundant authentication prompt on startup

    • Inactivating relation in source environment throws errors when importing to second environment (can't inactivate relation on import)

    • Reset all CDC does not calculate record count

    • Azure Spot with Fallback does not save

    • Custom Ingestion Process stuck In Progress for hours and core restart didn't clear

    • Full refresh sources have redundant key refresh attributes

    • Prevent certain characters (/"&) in terraform db passwords OR escape them in deployment app/everywhere

    • All Auth0 rules are "global"

    • Grey out recalculate button if no passed CDC inputs on source

DataForge Version 6.1

  • January 12, 2023
    • Relation validation throws intermittent error
    • Unable to clone templated relation that uses enrichment template
  • December 2, 2022
    • Cross AWS Account S3 ingestion always gives 403 forbidden error
    • Custom ingestion record count may be incorrect
    • Globally-scoped processes keep retrying indefinitely when spark job fails
    • Change hub table check wait time from geometric progression to constant
  • November 11, 2022
    • Multipart file ingestion creates input/process for every file in the directory
    • Remove nullability schema check for sql server output connector
    • Output is skipped with 0 records when when job run is switched between refresh and output
    • Manual reset enrichment on key source input may result in data loss
    • SQL Server and Snowflake outputs do not delete temp table after error
    • Timestamp sources not updating same time ranges / keeping old records
    • Agent runs out of disk space in Azure when pulling large tables
    • Icon links are missing from search-select component dropdowns
    • Inactivated source continues scheduled ingestion and processing
    • Agent log UI is slow, overloads DB and may crash API
    • Changing tracking columns on source does not show reset CDC popup
    • Delete Data generates error when hub table does not exist
    • Effective Record Count higher than Record Count
    • Relation Expression Validation Error covering whole page and blocking other attributes
    • Import/Clone logs have disappeared from UI
    • Watcher sources trying to ingest when no file is present (xml error after)
    • Unlinking a source from Rules Templates page is throwing an error message
    • SVC_Create and Update error trying to create/save dependency
    • Source status doesn't match input statuses
    • Hub View is not updated during schema check
    • Core doesn't come back when Databricks api call fails
    • Deadlocks between meta.prc_process_get_next (sparky) and meta.r_update_job_run (core)
    • Disable ingestion didn't stop new input from running on schedule
    • Failed cleanup keeps retrying indefinitely
    • Reset all CDC on key source overwrites hub table with incorrect decimal types
    • Source Data Deletes processing very slowly - potentially colliding with mini-sparky
    • Graph Legend on Lineage doesn't show up on Azure - shows as a broken picture link
    • Dead agent does not fail Inputs and keeps them permanently queued
    • Heartbeat route throws error when Agent calls with invalid agent code
    • Our config values do not override play API defaults
    • Spark output connection None.get
    • Self-Relation with the same name as a normal Relation breaks the ability to use Self-Relation
    • Change name in output column adds new column when promoting and does not remove old column
    • Source delete is allowed when active (Q or I) processes exist
    • Ingestion breaks on raw attribute normalization when columns come in with _2 already on them
    • Removing output columns from source environment that are still used in target environment breaks import
    • Clearing Spark Conf value and saving the cluster config showing an error message
    • Add count of enrichments/rules to rules page
    • Inactive Rule are no longer highlighted in red on Enrichments list page
    • Manual Refresh required to see new updated relations and rules in table
    • Output column data type doesn't clear out

DataForge Version 6.0

  • September 28, 2022
    • r_update_job_run blows up during ingestion
    • File Mask needs to have Inbox/ to find file correctly
  • September 13, 2022
    • Output query writes all enrichments as null
    • Upgrade Spark JDBC driver used in API to address log4j vulnerability
  • September 9, 2022
    • Schema unlock error is not logged in pg logs and missing key info

    • Link relation template to source blows up

    • Rule Expression Validation Error covering whole page and blocking other attributes

    • Export includes sources that have been loopbacks in the past

    • Fix columns widths on Sources table from cutting off data

    • Needs to refresh the Rule Templates page manually after adding Source to see new Linked Source and Updated By name is incorrect

    • Show better error message in Databrick log when Hive query in incorrect

    • Bulk reset CDC will fail when source schema varies

    • Input stuck in workflow queue when hard dependency source (key) pulls in zero record input

    • Output Failure Alerts not working

    • Support link on Unauthorized page points to wrong DataForge support URL

    • Enriched fields not being aliased in GROUP BY of aggregate output query

    • Users are able to reset CDC and enrichment on single input that failed custom ingestion

    • Remove redundant updates to job_run_status table

    • Import process gets stuck in 'I' indefinitely when core restarts

    • Import does not work right now for sources with custom notebook and no connection attached

    • Reset all CDC fails after converting source Full->Key

    • Can't use "Save and Reset All CDC" when changing source from Full to Keyed and only has zero record inputs

    • Can't unlink rule template from sources when validation is auto generated

    • Workflow is not checked when Cluster is terminated

    • Cannot remove config for driver type on cluster config

    • Source that has a rule created with a relation is always able to enqueue recalculate usingchanged only button - even if nothing has changed

    • If relation template points to source template, grey out sources in new linked source list that have no group assigned

    • Rules tab no longer displays column headers with no rules

    • UI is not showing error message for Relation Templates

    • Needs to refresh the Input page after adding rule to see "Recalculate Change" option

    • Space out Save, Save and Create Validation buttons

    • Deadlock on Input Delete from Broadstreet

    • Cancel button on Schedule Page does not work

    • Output column name restrictions aren't followed when switching from Table to File type

       

Updated

Was this article helpful?

0 out of 0 found this helpful