Working With Multiple Languages
You will need to use a DataForge SDK supported language to run custom processes from your Databricks notebook. However, you are not limited to only using these languages to achieve what you need. Databricks notebooks natively support Scala, Python, R, or SQL and they can all be set up to work together as needed.
There are many use cases where you may want to use multiple languages. For example, if you are incorporating machine learning into the data you are bringing into DataForge, you can create a notebook with multiple cells to use Python or R and then join that with code in Scala to ingest the data into DataForge.
Begin each cell with the language magic command "%<language>".
An easy way to reference data between languages is to use Temporary Views. For example, create a GlobalTempView of the DataFrame in the first cell and use spark.table("global_temp.<temp view name>") in the second cell where you've switched languages. Below is a simple example of this description.
Cell 1:
%python
#Creating a dataframe, Dataset would be referencing an existing dataset
pyDF = spark.createDataFrame(Dataset, schema)
#Turn dataframe into Temp View to use across languages
pyDF.createOrReplaceGlobalTempView('pyDF_temp')Cell 2:
%scala
//Reference temp view from Python cell into Scala cell df1 dataframe for further use
val df1 = spark.table("global_temp.pyDF_temp")
Updated