Clear Street — Modernizing the brokerage ecosystem
Engineering10 min read
Jun 30, 2022

GitLab to GitHub Part 1: Hot Swapping Clear Street’s CI System

Clear Street Engineering

Since our founding in 2018, Clear Street has used a self-hosted GitLab instance for code storage and Continuous Integration (CI). GitLab was a satisfactory product for us in our early, high-growth days because it supported many features out of the box, such as CI, issue tracking, PyPI, NPM, Docker registry, and feature flags. However, over the years, we experienced pain points with GitLab. Internal sentiment for better code storage, code review, and CI platform began to accumulate organically among engineers.

Eventually, we knew it was time for change. About a year ago, we decided to migrate our code storage to GitHub Enterprise Cloud, and to use GitHub Actions with self-hosted CI runners as our CI system.

In this two-part post, we’ll outline why we decided to migrate to GitHub, how we set up CI in GitHub, the steps we took to migrate, and the outcome of our migration.

Let’s start with part one: Why was a migration necessary?

A Brief History Lesson

When Clear Street was a young startup, GitLab was a useful all-in-one platform. We used it to store code (our git monorepo), track issues, run CI, expose a feature flag server, and even store Docker images and library packages (i.e., PyPI and NPM registries).

Over time, as the engineering team grew and our systems became more complex and high-load, we decoupled these components into dedicated, resilient systems. For example, we now run a high-availability Artifactory cluster to store Docker images and library packages and a multi-node Unleash instance for feature flag management.

In the end, we were using GitLab for only code storage and code review, as well as for CI. However, our almost 150 (and growing!) engineers ran into various pain points while working on our main monorepo, Fleet. The top issues were:

  • Git push/pull performance took more than a minute.
  • CI jobs were slow to start. Jobs were often queued for a long time, not because CI runners were unavailable but because the internal GitLab queue was sluggish when processing CI job requests. We’d vertically scaled resources and tuned sidekiq configurations to help mitigate the issue, but overall performance gradually degraded.
  • The last few log lines from failed CI jobs were sometimes missing, making debugging CI failures challenging.
  • The code diff viewer in GitLab was not performant. Viewing large diffs in the GitLab UI was impossible and often caused the browser window to hang or crash.

Moreover, as a startup, we have a relatively small team that needs to prioritize where we spend our time. We came to the consensus that we didn’t want to continue to manage a self-hosted code storage and CI orchestration system if we could find a SaaS system that met our needs so our engineers could focus on solving other pressing problems instead of tuning code storage servers.

So Many Choices, What to Choose?

Based on the above, we decided to migrate off of self-hosted GitLab, and using GitHub Enterprise Cloud for code storage was a natural choice. GitHub has improved monorepo performance over the years, and its UI is performant and modern, showing large diffs and providing a clean and familiar user interface. The question became which CI system we wanted to use.

We won’t dive into our evaluation of alternative CI systems, but our top contenders were CircleCI, GitHub Actions, Codefresh, and Buildkite. We decided on GitHub Actions because of its native integration with GitHub and the growing community of open-source pre-built Actions that we could use in our Workflows.

Spike it Up!

The initial CI design phase involved understanding GitHub Actions, the Continuous Integration/Continuous Delivery (CI/CD) platform provided by GitHub, and ensuring that it met all of Clear Street’s CI usage needs. In pursuit of this goal, we developed several POCs to showcase the viability of GitHub Actions to replace what we had achieved using GitLab CI.

During the early investigative stages, we focused on using actions-runner-controller to configure and provision self-hosted runners within our Kubernetes cluster. Subsequently, we conducted a spike to create a basic GitHub Workflow that would perform linting, testing, and building for a Go service. Continuing the exploration of GitHub CI, we examined third-party open-source custom GitHub actions that would play a crucial role in constructing our workflows. The endeavor also motivated us to develop custom actions tailored to Clear Street's specific CI demands.

To Merge Train or Not to Merge Train, That is the Question

GitLab has supported merge trains for some time, which enabled us to sequentially queue merge requests and validate the compatibility of changes before incorporating them into the main branch. Merge trains let us order the flow of changes into the main branch, helping keep it “green.”

At the time of our migration, GitHub lacked native support for merge trains. As such, we explored bors-ng as an alternative solution for GitLab's merge train functionality. It satisfied our prerequisites, and we knew we could set up something similar to merge trains in GitHub.

However, one notable drawback of employing merge trains is that it makes merges sequential. This sequencing leads to significant time delays during the code merge process, especially when a change is behind another change with slow CI. While some CI jobs, such as builds, can be optimized for time, CI time will inevitably grow as we add increasingly sophisticated tests (read more about our efforts using Behavior Driven Tests here!).

We decided not to port merge train functionality for GitHub because of these time delays and GitHub’s lack of support for native merge trains. This solution works well for us because, given the large size of our main monorepo, changes rarely conflict with each other. We also set up alerts to notify teams if a merge broke the main branch so they can prioritize putting in a fix. Broken main branches have rarely been an issue for us in the months we’ve been using the workflow, and when they do happen, our engineers fix the problem quickly. Our engineers are delighted with the tighter iteration loop enabled by faster merges.

Tag, You’re It!

Clear Street embraces deploying pre-release artifacts automatically created by our CI system from unmerged branches. This functionality empowers developers to rapidly test changes in the development environment before officially committing the code to the main branch. It also enables “hotfix” use cases where an on-call engineer can deploy a pre-release artifact to the production environment to resolve an ongoing production incident.

In a merge train, GitLab CI creates a temporary commit that no longer holds any meaning once CI runs, so none of our artifacts were tagged using the commit hash. To mitigate this, we tagged pre-release artifacts with the merge request ID and merge train artifacts with a merge train ID. As a result, multiple pushes to the same merge request would overwrite previous pre-release artifacts.

Removing merge trains lets us reliably tag all our artifacts with a commit hash and a human-readable timestamp. These tags solve our previous issue with overwriting pre-release artifacts for the same merge request and provide developers with more information to associate an artifact with the respective code changes.

Wrangling a Monorepo

As discussed, Clear Street uses a monorepo for our main code repository, Fleet. While monorepos have many benefits, they come with challenges when building an efficient CI system. The first major challenge was to find a way to build only code that has changed and to invoke CI jobs associated with those changes.

In our current setup, all of our projects within the monorepo are isolated to their directories. Shared code is published to Artifactory as libraries and consumed by applications as a remote artifact. So, for example, we can say, “If directory src/team/services/payments changes, run lint, test, and build for payments.”

We found that paths-filter, a third-party open-source GitHub action, met all our requirements to detect these directory diffs. Teams define a YAML file with a list of applications and the directory they live in. Because we have a consistent directory structure, we use a simple pre-commit script to list subdirectories and generate a list of applications. Here is an example file:

Unset
# AUTOGENERATED FILE - DO NOT EDIT.
---
web: # The web service lives in the directory src/team/services/web
- src/team/services/web/**
payment: # The payment service lives in the directory 
src/team/services/payment
- src/team/services/payment/**

We pass these files to paths-filter, and the output feeds into a matrix to run CI for only the affected apps, like so:

Unset 
jobs: 
  changes-team-services:
    runs-on: [self-hosted, small] 
    permissions: # Omitted 
    outputs:
      packages: ${{ steps.filter.outputs.changes }} 
    steps:
      #... clone repo ...
      - uses: dorny/paths-filter@v2
      id: filter 
      with:
        filters: src/team/services/filter.yaml

test-team-services:
  needs: changes-team-services
  if: ${{ needs.changes-team-services.outputs.packages != '[]'&& needs.changes-team-services.outputs.packages!= ''}} 
  strategy:
    fail-fast: false 
    matrix:
      package: ${{
fromJSON (needs.changes-team-services.outputs.packages)}}
  runs-on: [self-hosted, medium] 
  steps:
    #... clone repo ...
    - name: Test ${{matrix.package}}
    run: make -C src/galaxy/team/${{ matrix. package }} test

Once we were satisfied that our CI design met our requirements, we were ready to begin the migration process. Read part 2 to learn more about our migration approach, testing, lessons learned, and next steps for GitHub at Clear Street.

Thank you to our authors Joseph DeChicchis and Sachin Ananth Navale!

Help & support

Get support

Contact

Please add your full name
Please add your work phone
Please add your company
Get in Touch ImageGet in Touch Image

Get in touch with our team