General FAQ

  • Which cloud technologies can be used to host DataForge?

    • Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) are supported. 
  • Is DataForge capable of connecting to other data sources directly like Salesforce?

    • Yes, DataForge has a built-in connector for Salesforce.  The platform is capable of connecting to other systems that may not be built directly into the platform through the use of the DataForge SDK.  So long as the source/target system can be connected to via Databricks notebook, the data can be ingested into DataForge or pushed out of DataForge.  Common examples of these connections involve JDBC connectors or API calls.  For more information on using Databricks notebooks with DataForge, please see the documentation on the platform SDK.
  • Will we have a dedicated ETL tool that can handle the workload we need?

    • Data Integrations are a native functionality within DataForge.  The platform is capable of handling the data integrations through the Connections and Agents that are set up in your workspace and applied to Sources.  In addition, schedules can be set up in the platform and attached to a source to handle ingestion, rules, and outputs.
  • What does DataForge offer above and beyond Databricks alone?

    • Although DataForge as a platform utilizes Databricks for processing, it is only one piece of the software and process flow that DataForge manages in the data lifecycle.  DataForge provides a convenient and efficient user interface to interact with infrastructure, data engineering and integration, source and target systems.  Boundaries and guidelines have been built into the platform to speed time to value and make day-to-day data configuration and processing easier to manage on the end user.  In addition, the complexity of finding and solving data issues is drastically reduced with the DataForge platform.
  • How does DataForge handle multiple developers at the same time? 

    • DataForge does not have a limit on number of developers that can be working in the same environment at the same time.  If more than one person is working on the exact same thing, the latest change saved will override any earlier changes.
  • Who can change what and where in the platform?

    • By default, any user with access to the DataForge environment can change any of the configurations available.  Access can be restricted by workspace (i.e. Dev, QA, Prod) through user management.
  • How well does this platform optimize storage, computer, and transfer costs across the various cloud technologies?

    • The DataForge platform makes use of the latest and greatest technologies to provide customers with levers that can be used to manage and optimize performance, storage, and cost.  There are built-in processing efficiencies to optimize the compute power used during the end-to-end data lifecycle.   Similarly, for storage, DataForge has built-in capability for things like cleanup configurations which are configurable and reduce the amount of overall storage necessary to maintain the tool.  
  • Does DataForge handle unstructured data?

    • Yes, however, it is not recommended as the best use case for this platform.  DataForge prefers structured or semi-structured data that is repeatable.
  • Can DataForge handle data with characters in other languages?

    • Yes, DataForge can handle special characters in the dataset through Apache Spark native functionality.  However, it is not possible to use special characters in the column headers.
  • Are there any limitations on data sizes that DataForge can handle?

    • There are no known limitations of data sizes that DataForge can handle.  Depending on the data size and refresh frequency, there will likely be compute resource tweaking to find the right fit for the job.

 

 

 

Updated

Was this article helpful?

0 out of 0 found this helpful