Managing Projects with Github

Managing Project configurations with GitHub allows users to efficiently merge changes from one project to another using pull requests and merging, providing a view of exact changes in the process.

GitHub Integration

Open a project and navigate to the Git tab to start the integration setup. Each workspace is tied to a specific repository, but can be reset as needed.

Add or update a personal access token for credentials to GitHub. 

Choose which Git repo, branch, and optional path you want the project to connect to.

Utilize features like push and pull to automatically send the latest changes to GitHub for further review and pull request/merging. Options exist for regular and force pushes and pulls. User information is passed in the commit message to GitHub for granular tracking back to the DataForge user.

Git Branching Strategies

Beyond setting up the integration, the first step in this process is identifying the Git branching strategy that works best for your environment and developer team.  Projects are not inherently tied to any specific object in Github, so any of these workflows would be viable, although there may be reasons to lean towards one vs. the other.  Below are some starter links explaining different approaches to managing these workflows. 

DataForge recommends using the Gitflow branching strategy for migrating changes and managing projects.

Be sure to get familiar with the general concepts of Github - creating an initial repository, working with branches, creating pull requests, and handling merge conflicts.

Users have the option of which Git tools to use to manage the branch and merge process.  Some common GUI client tools include GitKraken, Github Desktop, GitLens, and Sourcetree.  This guide will use GitKraken.

The rest of this article will demonstrate the Gitflow branching strategy which is our recommended approach for customers. 

Gitflow Branching with DataForge Projects explained

A change needs to be made to a production/live source in DataForge, but it should be developed and tested separately.

We currently have two DataForge environments deployed, named Dev and Prod.  Each environment has one project named Default that holds all configurations.  We want to set up a process using Gitflow to allow changes to be made in a separate project and migrated through to production.

As part of the Gitflow branching process, we will set up multiple projects in DataForge and multiple branches in Github, as illustrated by the diagram below which shows the loose connection between the projects and branches.

As you can see, we will have a Master branch that is associated with the Default project in Production and a Next branch associated with the Default project in Dev.  Each developer will create their own project in the Dev environment in DataForge. 

As changes are needed, developers will use the configurations in the Next branch or Default project in Dev to import into their own Project in the Dev environment as a starting point.  After making the necessary configuration changes, the developer will export their own Project configurations from DataForge, create a new feature branch in Github, commit the configurations to the feature branch, and create a pull request from the feature branch to the Next branch.

Once the pull requests are approved and merged into Next, the configurations from the Next branch will be imported into the Default project of the Dev environment where we can unit test the changes.  Once all changes have been unit tested, we will create a Pull Request from Next to Master.  After approving the pull request and merging the changes into Master, the configurations from Master will be imported into the Default project of the Prod environment where the changes will go live.

After this, the cycle repeats as more changes are needed.  This end-to-end process is demonstrated in the diagram below.

 

Initial Setup of Github repository with two branches

Create a new repository in GitHub using the GitHub UI. 

Creating new public repository named dataforge-projects

After the repository is created, we will populate it by uploading the configuration files from the Default project in the Prod environment and committing the changes.  

Export the Default project in the Prod environment.  Projects are managed in the Projects page of the main menu.

Open the file explorer and extract or unzip the exported project folder

Upload the sub-folders from the extracted project folder to your new Github repository.  This initial upload will commit the production Default project configurations to the master branch of the repository.

Drag the sub-folders of the project to Github to upload them and commit the changes

After making the initial commit, we have a master branch that has the latest configurations from the Default project of the production project.

Click on the "1 branch" option to open the branches page.

Add a new branch using the New Branch option near the top and name it "next".  We use the Master branch as the source to pre-populate the new branch.

Once the master and next branches are created, we will clone the Github repository to our local computer using Gitkraken.

In Gitkraken, choose File -> Clone Repo and choose the folder location where to clone the repository.  Choose the dataforge-projects repository and select the Clone the Repo option.

Open the Repo after the clone is complete.  We see the master and next branches located in the Remote repository with the master branch initially also in the local repository. Moving forward, as a developer, we will not be committing any changes to either of these branches.  Instead, we will continue to create new feature branches using Gitkraken and committing changes.  Afterward, we will make a pull request in Github to move approved changes from feature branch to next branch.

Since we want a clean starting point, the last step we take as a one time setup activity is to import the configurations currently in the Next branch to our Default project in the development environment.

There are two ways to do this step.  The first is by checking out the Next branch locally in Gitkraken and the second is by downloading the zip folder from Github.  In this example, we follow the first option.

In Gitkraken, double-click the next branch to checkout the branch files in our local repository folder.  Open the local repository folder to find the contents of the next branch (location chosen in an earlier step when cloning the git repo).

Select and copy all folders.  In a different folder, create a new Zip folder and name it dataforge-projects-next (repo-branch names).  Open the zipped folder and paste the copied next branch contents in it.

At the end of these steps, we have a zipped folder with sub-folders representing our configurations currently in the next branch.

Lastly, we open the development DataForge environment and navigate to the Projects page to import the latest configurations to our Default project (representing the next branch).

Once the project import is complete, the Default project is now up to date with the Next branch. 

Updating developer projects and making changes

If a change is needed to configurations, the developer should follow the workflow by first creating or updating their own project in the development environment with the latest configurations from the next branch or the Default project in the same environment.

If the developer does not already have their own project, use the New + button on the Projects page to create a new developer project.  Be sure to select save to create the new project.

Export the Default project first.

Import the exported project folder into the developer project.

Once the import is complete, the developer project is up to date with the latest changes from the next branch/default project and is ready to make changes.

While selected on the developer project, we'll add a new rule called "reverse account balance" to one of the sources named "Databricks JDBC Samples - tpch.customer". 

Now that the changes are completed, the developer can continue to follow the workflow process to migrate the changes.

Creating a new feature branch and committing changes

After changes are complete in the developer project, the developer should export the developer project to be used in a new feature branch.

The developer now creates a new branch in Github.  To do this, we'll use Gitkraken, however this can also be done directly in Github similar to the initial setup steps.

Open Gitkraken and right-click on the remote next branch and choose the option to Create branch here.

Give the new feature branch a name like "feature1-reverserule".  Double click the new feature branch or right-click and choose Checkout which will update our local repository folder to point to the new feature branch.

With the feature branch checked out, copy the sub-folders from the exported developer project and place them in the local git repository folder.  Choose the Windows option to Replace the files in the destination.

In Gitkraken, the developer now sees the file changes listed for the feature branch.  Choose the option to Stage All Changes. 

After staging all changes, add a commit message that describes the change that was implemented and choose the option to Stage files/changes to commit.  

After committing the changes to the feature branch, the developer uses the Push option to push the latest feature branch configurations from the local repository to the remote Github repository.

Choose Submit to finish the Push to the remote repository.  Afterwards, the new feature branch and the configurations will appear in Github.

Creating a Pull Request

Now that the feature branch is created in Github, the developer switches to the Next branch to make a new pull request using the Compare & pull request option.

On the next page, the developer needs to change the base branch to be next instead of master.  Further down, Github also lists all of the changes included in the pull request for a final review before submitting. 

Select the Create pull request option to submit. 

Approve the Pull Request into Next

When the changes are ready to be tested, a developer needs to merge the pull request into the next branch.

Open the Github repository and select the Pull Requests tab.  In this example, there is only one pull request to review from the feature1 branch to next.  If there were multiple developers working in DataForge there may be multiple pull requests to review and merge together.

Open the pull request and review the Commits tab to confirm the changes are still wanted.  When ready, select the Merge Pull Request option on the Conversation tab.  Select the option to Confirm merge to finalize the merge into next.

Once the commit is merged, this pull request is complete. 

The next step is to update the Default project in the development environment with the latest configurations from the next branch.

Update Default project with configurations from next branch for testing

To unit test the changes made, we will use the configurations in the Next branch to import into the Default project of the development DataForge environment.  This allows us to reuse data already ingested in the project but with the updated configurations.

In Gitkraken, select the Next branch under the Local section and then use the Pull option at the top to update the local next branch with the latest from the remote next branch.

Double-click the next branch under the Local section or use the Checkout option if not already done.

Copy the sub-folders from the Git local repository folder and paste them in a zip folder.

Import the zipped folder into the Default project in the development DataForge environment. 

When complete, users can perform any testing necessary to confirm the changes are working as expected before moving them to production.

Create a Pull Request from Next to Master

After the changes have been tested and working, a developer creates a pull request in Github from next to master branches.

In Github, switch to the master branch and select the Compare & pull request option. 

Confirm the base branch is master and the compare branch is next.  Select the Create pull request option.

Approve the Pull Request from Next to Master

In the Github repository, open the Pull Requests tab and select the open pull request from Next to Master.

Review the commit changes and when ready, select the Merge pull request option and finalize by selecting Confirm merge.

When the merge is completed, the Master branch will contain the latest changes from the Next branch.

The last step is importing the Master branch configurations into the Default project in the production environment.

Update Default project in production with configurations from Master

The last step in the workflow is to import the Master branch configurations into the Default project in the production environment to push the changes to production.

In Gitkraken, select the master branch under the Local location and select the Pull option to catch up the local master branch to the remote master branch.

Double-click the master branch under the Local section or use the Checkout option if not already done.

Copy the sub-folders from the Git local repository folder and paste them in a zip folder.

Import the zipped folder into the Default project in the production DataForge environment. 

When complete, the configuration changes are migrated to the production project and are live.

Repeat cycle for more developer changes

When more changes are needed, repeat the cycle starting with updating developer projects and making changes.

Making a Hotfix Change

At times, developers may need to make a hotfix directly to the production environment.  Following the Gitflow process, developers can make a Hotfix branch off the Master branch. 

Use the configurations from the Hotfix branch to import into a new project in the development DataForge environment.

Once the changes are made to the Hotfix project, developers export the project configurations and commit them to the Hotfix branch.  Afterwards, make a pull request from Hotfix branch to Master branch as well as a pull request from Hotfix branch back to Next branch.  Once the pull requests are approved/merged, import the contents of the branches to the appropriate projects.

Updated

Was this article helpful?

0 out of 0 found this helpful