GitLab to GitHub Part 2: Ready, set, migrate!
Clear Street has used a self-hosted GitLab instance for code storage and Continuous Integration (CI) since our founding in 2018. Rapid growth led to internal demand for better code storage and review, culminating in migrating to a new solution - GitHub Enterprise Cloud.
Where GitLab had issues with slow git push/pull, and CI job starts, GitHub offers improved monorepo performance and a modern UI with a clean user interface. Part one of this series covered why we moved our CI from GitLab to GitHub and how we designed the new CI, including Proof of Concepts (POCs), merge trains, and monorepos. Part two will discuss how we approached migration, testing, and lessons learned.
We split the migration process into three phases:
- Spike GitHub Actions with a test repo.
- Migrate our lower-impact repos to GitHub and build tooling for the main monorepo (Fleet) migration.
- Migrate our main monorepo (Fleet) to GitHub.
Overall, the migration took two and a half quarters, with a whole quarter dedicated to migrating our main monorepo Fleet.
We first migrated our Infrastructure as Code (which contains Terraform configurations of our infrastructure) and Retool (which includes Retool configurations) repos to GitHub. These repos have less usage than our main monorepo, so some downtime in CI does not result in a significant productivity hit for our engineers.
In all of our migrations, the first step was to enable code sync between GitLab and GitHub. We used GitLab’s built-in mirror functionality to push code to GitHub. Because GitLab was still the source of truth for all code merges, any diverged code in GitHub was forcibly overwritten.
Confidence Unleashed
Once we configured the repo mirroring, we set up dual CI between GitLab and GitHub. The Infrastructure as Code and Retool repos didn’t have too much CI to move over, so we manually validated the new GitHub CI and cut over once we had confidence in our GitHub setup. However, we built a tool called Octomigrator for our main monorepo migration to make validating CI parity easier.
Octomigrator helped validate GitHub CI for our monorepo Fleet by creating a corresponding GitHub pull request when a merge request was opened in GitLab. This triggered GitHub CI for the code change, and Octomigrator reported the result of GitHub CI as a comment on the GitLab merge request (see example screenshot). Moreover, we have a job called check_github_workflow_state that fails a GitLab pipeline if the corresponding GitHub Workflow fails, which we gradually enabled as we gained confidence in GitHub CI.
This system was crucial for our migration because we could effectively get to a state where GitHub CI was fully operational before cutting over from GitLab to GitHub. We even started deploying artifacts built using GitHub CI before fully cutting over to GitHub.
Once we had confidence in the GitHub Workflows for all components in our Fleet monorepo, the actual cutover was as simple as disabling writes to GitLab, enabling writes to GitHub, and having engineers switch their Git remote origins to point to GitHub.
Team Work Makes the Dream Work
We’d like to note that the work to migrate CI to GitHub was distributed across the engineering organization. Our monorepo is organized in subfolders, which are primarily owned by different engineering teams. Each team has a slightly different CI tailored to their use cases that a central Infra team doesn’t have full context over.
As such, we recruited team representatives whom we worked closely with to migrate each team’s CI to GitHub. A nice side effect of distributing this work was that each team now had a person with some knowledge of GitHub who could help their teammates acclimate to GitHub instead of all communication needing to flow through a central Infra team.
Actions in Action
One of the most appealing features of GitHub CI is the ability to write custom code to interact with the repository. This empowers us to create efficient, tailored, and automated workflows to enhance the CI process by boosting reusability and mitigating errors. Custom code also eliminates the need to write complex steps to perform simple operations and the need to copy and paste these steps into many files. We use GitHub Actions at various stages of the CI Workflow lifecycle, including when setting up a job, validating branch names, enforcing testing policies, helper actions to upload/download artifacts, and more.
At Clear Street, we love monorepos, and custom actions are no different. Hence, we created a monorepo of custom GitHub actions. This monorepo consists of actions that can be shared across multiple repositories (i.e., our main monorepo Fleet and others tailored to narrow use cases). If an action is only used in a specific repository, we add it to the .github/actions folder. We used GitLab’s templates, but they are not as powerful as GitHub’s Actions, which has much more robust composability and a thriving open-source marketplace.
Testing…1, 2, 3
One of our goals during the transition from GitLab to GitHub was to improve our CI system monitoring. We monitor metrics such as the count of queued jobs, the duration of queuing, the duration of jobs, the number of CI runners, and the number of underlying Kubernetes nodes we use for CI for a comprehensive grasp of CI performance. We monitor the health of our CI runner Kubernetes Pods by scrutinizing resource utilization and observing Kubernetes events. Then, we alert on conditions such as a high job queue duration and issues with GitHub itself. This approach provides us with comprehensive insight into the health of our CI platform.
Our test repo has the same CI runner configuration as a production instance to trial any new GitHub CI Workflow features or use cases. This gives us a quick feedback loop without unintentionally disrupting existing CI and lets us test upgrades to our CI runners. Moreover, this test repository was made available to all teams during the migration, allowing them to delve into and develop a deep understanding of the new CI system.
Judgement Day
The cutover of our smaller repos (i.e., Infrastructure as Code and Retool) and the final cutover of the Fleet monorepo to GitHub went very smoothly. We had few reported issues once we cut over.
Our CI metrics have also shown that CI job queue time is low, with a p90 queued time of ~80 seconds and a p95 queued time of ~125 seconds (see screenshot). There is a long tail of job-queued duration because we aggressively scale down our CI runner count during non-business hours, resulting in longer initialization time for off-hours CI jobs (e.g., nightly cron jobs). We autoscale our GitHub CI runners based on demand. On a typical day, we’ll reach around ~1000 concurrent runners (i.e., Kubernetes Pods) running on ~50 Kubernetes Nodes (i.e., EC2 instances).
A month after the cutover, we conducted an internal survey of our engineering organization to understand our users’ experience with the new platform and how they felt about the migration. Overall, our users found the new GitHub CI more effective than the old GitLab CI and reported that Git operation performance had improved.
Engineers were a little mixed on whether they liked the GitHub UI more, although more than half of respondents appear to prefer GitHub’s UI over GitLab. Everyone seems to have had a positive experience with the migration. We received a couple of small actionable items from the written feedback (e.g., “How can I tune my notifications better?” “Can we prune stale branches?”), but nothing that indicated significant developer pain points on the new platform.
Here are the survey results. The prompt was “Please choose the option that best matches your experience for the following questions.” with the scale being “1 (strongly disagree), 2 (disagree), 3 (neutral), 4 (agree), 5 (strongly agree)”.
To Infinity and Beyond
We made remarkable achievements during the migration from GitLab to GitHub, but we’re just getting started. There are a plethora of enhancements, improvements, and new features that we can build to further enhance the overall experience with GitHub. Here are a few we have in mind today:
- More robust runners: We currently support different runner sizes (i.e., small, medium, large, xlarge, and 2xlarge). These runners have different CPU and memory resource allocations. Furthermore, the runner pool is set up on a per-repository basis, meaning all teams share it. A significant limitation of this approach is that the runners must be relatively generic, and customization for individual teams or projects is currently out of reach. We would like to introduce more tailored runners according to the needs of various use cases.
- New Custom Actions: We want to examine current workflows to identify new custom Actions that can further streamline and standardize our CI.
- Improve CI efficiency: It is always an ongoing task to improve the cost efficiency of CI by tuning our autoscaling, optimizing runner resource allocation, and exploring new approaches to cost saving.
The migration was a terrific success, but some things didn’t go as planned. Here’s what we learned for next time:
- The GitLab to GitHub mirror was slow, ranging from a few minutes to up to 15 minutes, resulting in slower iteration loops and CI times during the phase when CI was running side-by-side.
- GitHub experiences frequent performance degradations. GitHub usually resolves these issues quickly, so the impact on our system is often not noticeable, but the problems are more frequent than we expected.
- The migration was on the wishlist for over a year until we decided to focus our efforts on it. Even though extensive migrations can feel daunting, we learned the tremendous value gained. Looking to the future, we’ll be more willing to dive into similar system-wide changes when needed.
Successfully implementing a migration requires careful planning and collaboration across teams. Migrating our CI from GitLab to GitHub was a long and detailed process; the results speak for themselves. We’re now better positioned to keep up with the demands of our rapidly growing business.
If solving complex engineering challenges interests you, visit our careers page.
Thank you to our authors Joseph DeChicchis and Sachin Ananth Navale!