Details requirements and configurations to installing an Agent on a server that can access an on-premise data source.
Prerequisites (Requirements for Installation)
-
The machine or VM must be running a Windows operating system:
- Windows 7 / Server 2008 R2 or later
- The latest version of the Amazon Corretto JDK 21 should preferably be installed on the destination machine. This is recommended for users with an Azure environment as well since the java version is agnostic of the cloud provider DataForge is used with. Oracle JDK 21 is also sufficient. If neither is installed on the destination machine, navigate to the site link and download and install the Corretto JDK that is compatible with the target architecture and OS of the machine that the Agent will be installed on.
- Since the Agent initiates all its own connections, outbound connections to various cloud resources on the public Internet are required. If a firewall is limiting outbound internet access, the following resources should be allowed through the firewall (exact domain names will vary by environment). Note that these are all secure endpoints, so if SSL inspection is enabled at the firewall, special consideration is needed to ensure that the certificate presented to the machine hosting the Agent is trusted.
- AWS S3 / Azure Data Lake Storage via HTTPS (port 443)
- DataForge API endpoint via HTTPS (port 443)
- Auth0 via HTTPS (port 443)
- File and database sources that will be accessed through the DataForge Agent must be accessible from the machine the Agent is being installed on. In the case of database sources, many customers install the Agent software directly on the database server - network connectivity from the machine that the Agent software is being installed on is all that is required. Note that the Agent will be performing data pulls and uploading to AWS / Azure, so the recommendation is to not segment off the Agent machine from the sources being accessed in a way that traffic needs to cross a limited capacity network segment to reach those sources.
- The SQL user account intended for the Agent to use to access source databases needs to be set up with native database engine authentication. In the case of SQL Server, Windows / Azure AD authentication is not supported for the Agent user and only SQL Server authentication can be used.
- Mixed-mode authentication can still be enabled to allow for integrated authentication for non-DataForge related loads.
- A Windows user account should be created to run the agent with username and password. This user must have the ability to access any folders containing source data as well as the ability to run services. The command whoami can be run in a command line to determine the current userid. If a user is not created, there is an option to run the Agent as local system, but is not recommended.
- If installing with Authentication Protocol 2.0, access to IAM in AWS or access to the Azure Storage Account that hosts the datalake is recommended.
Machine Guid Command
reg query HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography /v MachineGuid
Setting up a New Agent Configuration
Agent Inputs | Detail |
Name | Agent name on the backend, and name reference for sources and other DataForge elements within the UI. |
Description | Necessary clarification and detail. |
Code | Agent code on the backend - this ideally should be the same as Name |
Machine Guid | A specific key to identify the machine the agent is running on, see below for the terminal command to obtain. |
Authentication Protocol | 1.0 protocol uses Auth0 for agent authentication and will be deprecated in a future release. 2.0 uses DataForge managed authentication and is recommended for all new agent installs. 1.0 is not selectable in SaaS environments. |
IAM Access Key (AWS Databricks cloud only) |
Authentication protocol 2.0 parameter - the IAM access key for the user that has access to write files to the datalake bucket |
IAM Secret Key (AWS Databricks cloud only) |
Authentication protocol 2.0 parameter - the IAM secret key for the user that has access to write files to the datalake bucket |
Storage Account Name (Azure Databricks cloud only) |
Name of storage account that hosts datalake container |
Storage Account Key (Azure Databricks cloud only) |
Access key for storage account that hosts datalake container |
Datalake Container Name (Azure Databricks cloud only) |
Name of datalake container |
Save the Agent and keep the config file available
When the Agent is being saved, a prompt will appear to Save and Download Config File. Be sure to save this config file as it will be needed after the agent MSI installation below.
Setting up 2.0 Protocol for AWS Databricks Cloud
If the cloud that Databricks is running in is AWS, a user will need to be created in IAM that has access to the S3 Datalake that Databricks uses to process Dataforge files. It is recommended to create one user per agent, and to only give it write IAM permissions, no deletes. A recommended policy to attach to the user would be as follows (replace <datalake-bucket> with your datalake name)
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::<datalake-bucket>",
"Sid": ""
},
{
"Action": [
"s3:PutObjectAcl",
"s3:PutObject",
"s3:GetObjectVersion",
"s3:GetObject"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::<datalake-bucket>/*",
"Sid": ""
}
],
"Version": "2012-10-17"
}
Once the user is created, go to Security Credentials on the IAM user, and generate Access Keys. Enter the IAM Access Key and IAM Secret Key in the agent parameters page with 2.0 selected.
Setting up 2.0 Protocol for Azure Databricks Cloud
If the cloud that Databricks is running in is Azure, access to the Azure Storage Account and Datalake storage container that Databricks uses to process Dataforge files will need to be configured using a storage account key.
Storage account name will be the name of the storage account
Datalake container name will be the name of the datalake container that is mounted for Databricks access
Storage account key can be found in the "Access Keys" section
Make sure the key selected is "Key" and not "Connection string.
Finally, the Networking on the storage account needs to be set up in a way that the Agent will be able to access the container.
Finding the Agent MSI Install File
Installing the DataForge Agent (Windows)
The service account username MUST include the domain or local server name. Example: <domain>\username or <local server name>\usernameThis can be easily found by opening a command prompt and running "whoami" in the terminal.The user must have the ability to log on as and run services. The user must also be able to access any folders containing source data if the Agent will be performing file Ingestions. An alphanumeric password is recommended.
Place the Agent-Config.bin file and restart the Agent service
Troubleshooting installation
Common Errors with Agent Install
-
If the service is running and no log files are being created in the C:/logs/dataforge directory, there is a good chance that Java is not installed correctly on the server. Please consult the Java requirements at the beginning of the guide. Run "java -version" in a terminal to see if Java is installed, you should get back a message that looks similar to:
-
openjdk version "21.0.3" 2024-04-16 LTS
OpenJDK Runtime Environment Corretto-21.0.3.9.1 (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.3.9.1 (build 21.0.3+9-LTS, mixed mode, sharing)
-
- The service account doesn't have access to Program Files location or service account doesn't have permissions to create a service on the server.
- Service account username doesn't include the domain, i.e. user123 was entered instead of productionserver\user123. Make sure to run "whoami" in a terminal on the server to get the fully qualified username with a domain for the service account.
- Service account password has special characters. It is recommended to use an alphanumeric password at this time
- There is already a service with the same name on the server from a previous installation attempt. Use "sc delete" command in a terminal window to remove the existing service before trying again. The service will be named IntellioAgent_<agent-code>. An example of the command is below in the next section regarding uninstalling the DataForge Agent.
- Machine GUID is incorrect or is unable to be acquired by the agent. Check the agent logs (C:/logs/agent.log) and see if the Machine GUID is populated in the logs. If it is empty, the agent will need to be uninstalled and reinstalled with the Machine GUID value entered in the configuration-key prompt in the MSI installer.
-
Config file is misplaced or incorrect. Please make sure that the agent-config.bin file is placed in the appropriate location, and the service is restarted after the placement of the file. The file needs to be named "agent-config.bin" as well.
If the install is failing with a message similar to the following image, navigate to this location on the server: C:\Users\<username>\AppData\Local\Temp. Sort by date modified in your file explorer and look for the most recent file that starts with "MSI". The contents of the file will give more information about the MSI install error.
Uninstalling the DataForge Agent
Updated