- Service Configuration Viewable from UI
- Ingestion Queue Tab Added to Processing
- New History Tables Available in Postgres
- Source Changes Require CDC Reset to Prevent Data Duplication
- Additional Tables Managed by Cleanup Process
- Simplified Postgres Metastore Querying Through Databricks
- Group Tokens Available in Virtual Output View Names
- Source Level Triple Dot Menu Available in Sources Page
- Databricks Accessible from DataForge Menu
- Configuration Import Performance Optimization
Service Configuration Viewable from UI
By: Mary Adams
Users will find a new option under the System Configuration dropdown on the main menu; Service Configuration. This table allows users to view all meta.system_configurations right from the UI.
Users are also able to edit the values of some configurations, directly on the table. To enable editing, users can click the edit icon found in the Action column. This will enable customization for that specific configuration’s value.
Once a value has been edited, it will be highlighted orange. Should erroneous values be entered, an error message will display. Multiple configurations can be edited before saving. To save changes, users can simply click the save button in the upper right corner. Hitting cancel will undo all edits.
After saving, a message may appear asking users to restart impacted services (Core/API) in order for the configuration changes to take effect. This must be done manually and is not currently possible from the DataForge user interface.
Ingestion Queue Tab Added to Processing
By: Mary Adams
The processing page has a new feature, the Ingestion Queue. Location right before the Processing tab, users can now view active and completed ingestions from DataForge. Similar to the Job Runs table, the Ingestion Queue can be filtered with multiple values.
Opening the Filter Type dropdown gives users several filter options: Status, Agent, Connection, Schedule, and Source. Once a type has been selected, users can then either search for the desired value or manually scroll through the populated dropdowns. Once a filter is applied, it appears as a filter tag on the screen. Multiple filter types can be used but only one filter of each type is currently allowed.
The Ingestion Queue automatically refreshes its data. If there are more rows than can be displayed on screen, the Load More... option will be available.
Users can also navigate to other areas of DataForge user interface from the Ingestion Queue. The Source, Connection, Agent, and Schedule columns are all linked to their appropriate pages. Clicking a link will open a new tab to the desired page.
New History Tables Available in Postgres
By: Alec Judd
New history tables have been added to Postgres for querying: history.source and history.system_configuration. These tables will track changes made to the application and can be used for validation or change tracking. Tracking will include any changes performed via user interface or manual Postgres database updates.
Source Changes Require CDC Reset to Prevent Data Duplication
By: Alec Judd
Source changes now require a CDC reset to avoid data duplication and other processing issues. Previously, when a user made source setting changes that required a CDC reset, such as changing Data Refresh Type, a popup was provided asking whether the user would like to reset or save without resetting. The new validation feature will require CDC reset when the source settings are changed to prevent users from bypassing the suggestion and causing data issues. This validation will check for any data refresh type changes and CDC parameter changes that would be impacted.
Additional Tables Managed by Cleanup Process
By: Alec Judd
Three large history tables have been added to the existing cleanup process to help manage database size and lower costs: history.source_schema, history.workflow, and history.process.
These three new table cleanups are configurable in the new Service Configuration user interface page as the following setting names and default values:
- Source-schema-history-retention-interval (default 30 days)
- Workflow-history-retention-interval (default 30 days)
- Process-history-retention-interval (default 6 months)
For more information on the Service Configuration page, please visit Service Configuration Viewable from UI.
Simplified Postgres Metastore Querying Through Databricks
By: Alec Judd
A new option to query (select, update) the Postgres Metastore tables has been enabled through the DataForge SDK. This allows users to query Postgres tables without needing to launch and install additional services such as AWS Appstream and PgAdmin.
For instructions on how to use this option, please visit Accessing Postgres through Databricks.
Group Tokens Available in Virtual Output View Names
By: Alec Judd
Group tokens are now available to use in Virtual Output View Names to reduce manual editing during the cloning process. To use the Group token, Output Name Templates must be set up. The Group Token can be entered in the “View Name” setting of the Output Name Template which is then applied to outputs.
Any Virtual output applied to the Output Name Template will have the Group name prepended to the view name, making it unique across groups. When the View Name option is used in the Output Name Template, the view name will not be editable in the output itself.
Source Level Triple Dot Menu Available in Sources Page
By: Alec Judd
The source level triple dot menu that was previously only available on the Inputs page in the header row is now available as an option to use on the Sources page. This provides the same options as when it is opened on the Source Inputs page, but in another accessible location.
Databricks Accessible from DataForge Menu
By: Alec Judd
A direct link to Databricks has been added to the DataForge main menu to streamline user navigation to the Databricks environment home page. Databricks authentication/login procedures have not changed so users may need to re-authenticate to open the service after the hyperlink is clicked.
Configuration Import Performance Optimization
By: Alec Judd
Import performance has been improved through various optimization changes. Imports have switched to incremental updates of changed relations and enrichments. A validation has been added to ensure other existing rules and relations are still valid. Parsing of the import file has been optimized. Previously, large yaml imports took a very long time to process. These improvements will make initial imports slightly faster and subsequent imports with changes drastically faster to improve the end user experience.
Full Changelog
-
Update Snowflake JDBC Driver Version
- Updated JDBC Driver to 3.13.24
-
Update DataForge Roles in AWS
- The DataForge Roles have been updated to 3 different groups that users can be assigned to. These groups are specific to each environment. The permissions have been updated to include the latest infrastructure changes
-
Add methods to query Postgres to SDK
- Added static methods to DataForge SDK that allow to execute select and update queries on DataForge Postgres metadata database. Usage: attach sdk library jar to the databricks cluster
- Start scala databricks notebook using above cluster
-
Do not allow users to change refresh type or tracking columns without resetting CDC
- This change now requires user to reset CDC for a source when refresh type or any CDC-tracking values have changed. It ensures correct operation for the CDC and refresh and protects against data duplication and loss
-
Add history.source table and update on all changes (including import)
- Created history.source table that logs all parameter changes to source table, performed via UI or manually via db updates
-
Extend error logging for agent API calls
- Added extended error logging of API errors into local agent file log to help troubleshoot agent connectivity issues
-
Clean redundant code return by initial schedule initialization in core
- Improved performance of source schedule initialization by removing redundant data (sources with no schedules associated)
-
OpenJdk image deprecated, swap to Amazon Coretto image in all container Docker bases
- Base image for containers changed from OpenJDK 8 to Amazon Coretto 8
-
Show "Expression Validating" while expression is validating
- When user is modifying rule or relation expression, UI now displays yellow "Expression Validating" message below expression textbox
-
Add "Go to Databricks" link to main hamburger menu
- Added "Databricks" link to the main menu to streamline user navigation to Databricks environment home page
-
Add history.system_configuration table
- History table captures all changes to system_configuration table to help troubleshooting
-
Add ability to delete queued ingestions
- Added ability to delete inputs in Q status. When Queued input is deleted, associated ingestion queue records are also deleted.
-
Update meta.system_configuration via UI
- Service Configurations table has been added under System Configs enabling users to see their environment's service configurations from the UI. Users can also edit certain system configurations directly in the table.
-
Clean up system_configuration + add read-only flag
- Added additional columns to system_configuration table to support edits and change tracking
-
Update default label in dropdown for value dropdown in cluster config and make spark dropdown sorted
- On cluster configuration parameters tab, sorted spark version number and node types lists
- Fixed "(Default)" label formatting
-
ECS/Container Instance Agents should restart task or container when "Restart Agent" button is pressed
- Restart Agent button on Agents page will now entirely stop the ECS task or Azure Container Instance, guaranteeing a full Agent restart
-
Remove 5.2->6.0 leftover agent code
- Removed legacy pre-6.0 API routes, agent and db code
-
Create UI for viewing Ingestion Queue
- This new feature allows users to view the meta.ingestion_queue from the UI. This new table can be found as a tab on the processing page. Users can filter by filter tags, similar to the job runs table.
-
Add 3 tables to cleanup: history.source_schema, history.process, meta.workflow_queue_history
- Added 3 large history tables to the cleanup process to help manage db size and lower costs
-
Provide triple dot menu for each source on Sources page to kick off jobs before opening
- Added triple-dot menu in new column on sources list page. The menu enables all source-level manual reset operations available for each source type.
-
Import performance optimization
- Optimized import performance:
- - switched to incremental update of changed relations and enrichments
- - added validation to ensure other existing rules and relations are still valid
- - optimized parsing of the import file
- Optimized import performance:
-
Output_view_name needs to be manually changed when cloning to avoid overwriting data
- Added output view name to the output object template. This allows user to parameterize view name for virtual outputs and clone them
-
Create AWS temp credential Actor in Agent
- Default AWS temporary credentials for S3 access are now cached in the Agent and refreshed when they expire.
-
CCI maintence job code from Sparky and parameter from UI (it's on connections for sql_server driver)
- Deprecated connection parameter for SQL Server connection CCI job maintenance (not supported in Azure SQL)
-
Get Date From File name is not being used in Core or Sparky parse - either remove param or make it work
- Fixed non-functional parameter "Get Data From File Name" and updated parameter description
-
Validate Output Source filter
- Added validation in UI for output filter expression
-
Various Updates to Terraform Deployment docs
- Terraform documentation updates for AWS and Azure deploys
Updated