As of DataForge version 6.0.0, the Postgres metastore runs on an Aurora Serverless V2 cluster and RDS Query Editor is not supported currently on this type of RDS cluster. To access the Postgres metastore, users will now have two options: Databricks or AWS AppStream 2.0 stack which is included in the 6.0.0 Terraform deployment. Below you will find instructions for both methods.
Appstream will take longer to deploy as you will need to download software (PgAdmin) each time it's used and set up the server through the connection string. However, this option is useful for when you want to see all of the tables and columns accessible in the metastore. Databricks can be used for quicker querying if you know the table names that you want to access.
Accessing Postgres through Databricks (Available with 6.1 version)
In Databricks, create a new cluster that you will use for querying Postgres and install the DataForge SDK library to this cluster. If you are unfamiliar how to do this, please follow the steps in this guide. It is possible to use the mini-sparky cluster to query postgres although it is not recommended because long running queries could impact processing running in DataForge.
After a cluster with the SDK library is created/available, create a new Databricks Notebook which will be used to query the Postgres database.
import com.dataforgelabs.sdk.Pg._
display(select("select * from meta.source"))
update("update meta.source set source_description = 'try' where source_id = 4467")
Line 1: Required to access the Postgres Database through our SDK
Line 3: Example of query to view table results. Inside the double quotes you can adjust the select statement to query the table data that you are looking for.
Line 5: Example of query to update a table in Postgres. Updating postgres tables directly is typically not recommended unless absolutely necessary or as advised by our support team.
Accessing Postgres through AWS Appstream
In the AWS Console, navigate to the AppStream 2.0 service and click "Stacks". Select the stack that's naming convention matches the environment that is being connected to and click "Action" then "Create Streaming URL".
Enter a User ID of your choice - currently the User ID can be any value and if the URL expiration is set to a large value, the User ID can be used to reconnect to the URL. Once a User ID has been entered, and a URL expiration is selected, click "Get URL" then "Launch in Browser".
Select "Desktop" on the new tab that opens and wait about 2 minutes for the session to become available. Once the session is available, a standard Windows desktop should appear on the screen. When prompted by the browser to allow copy and paste, hit "Allow".
pgAdmin will now need to be installed on the AppStream instance. Launch the Firefox browser and navigate to the following link to see the list of pgAdmin downloads. https://www.pgadmin.org/download/pgadmin-4-windows/
Select the most recent version and then click on the .exe file on the next page to download it to the AppStream instance.
Once the file is downloaded, use the .exe file to install pgAdmin with default settings to the C:/ drive. Install mode should be "Install for me only (recommended)".
Once pgAdmin is installed, find the program by searching the start menu, and launch the pgAdmin application. A prompt will appear to set a master password - set the password to anything since it can be reset later and pgAdmin will be wiped away once the instance expires.
Right click on Servers and then Register -> Server
Give the server a name and click "Connection". In "Connection", enter the credentials for the Postgres metastore server. These can be found in the environment specific private secret in Secrets Manager. Click Save.
After clicking Save, the database should connect and be ready for queries. Right click on the environment database and go to "Query Tool" to run queries against the metastore.
Updated