Viewing Source Data

Every source has a hub table behind the scenes in Databricks that represents all of the input data brought into the source.


Querying Source Data in Hub Tables

All source data is visible via hub tables in Databricks.  Hub tables are stored in the hive_metastore catalog in Databricks. Clicking the Data View tab in any source will open the hub table definition in the Databricks catalog.

To query any particular source, open a Databricks SQL query or notebook (requires a cluster to be created) and run a query using the following syntax for a hub table: dataforge.hub_<source_id> or the source view name from Source settings. Visit Databricks documentation to learn more about using the SQL Editor or Notebooks.

The Source ID can be found in the site URL when the Source is opened or a list is provided on the main Sources page before opening a Source.  To query source view names, the project schema should be used which can be found in the project settings in the Projects page.

Examples:

select * from dataforge.hub_1
select * from project_schema.source_view_name

Viewing sub-source data

To view data in a sub-source, query the parent source to view the rule driving the sub-source where all sub-source attributes and rules are managed.

For full documentation, visit Sub-Sources

Updated

Was this article helpful?

0 out of 0 found this helpful