Clear Street — Modernizing the brokerage ecosystem
Engineering7 min read
Jul 7, 2022

Hot Swapping a Bank Part 2: Integration and Migration Prep

Clear Street Engineering

BK is Clear Street’s proprietary transaction processing system, used to process more than $3 billion in daily trading volume. BK is the newer, faster version of our legacy Book Keeping system, Bank. Where Bank had issues with low throughput, data modeling, and messy points of extension, BK is more agile and scalable, to support our growing business.

In part 1 of this series, I outlined how we identified areas for improvement within Bank and used a data validation system to ensure we were solving those issues when we built BK. In part 2, I’ll discuss how we approached compatibility layers and data migration.

Speaking in “Bank”

After thorough testing, our next challenge was to integrate existing services with BK. Initially, we aimed to have all clients of the system migrate to BK pre-release. As we began to understand the complexity of the release, it became apparent that this would create significant risk and take a lot of time.

We chose to build a Façade service to release BK without needing to update multiple services in different programming languages, while simultaneously ensuring every single service kept the same behavior between the old system and the new one.

This service implements the APIs of Bank, fully backed by BK, and concentrates all the necessary adapter logic into a single service. The process was not simple. It required a lot of testing and some trade-offs, but in the end, it enabled us to test other services using BK without even touching them.

To make the switch as seamless as possible, we used a few interesting approaches:

  • Our Kubernetes service for the Façade took over Bank’s name (DNS and Consul registration), and we renamed Bank to legacy-Bank. This connected services to BK the same way they always connected to Bank, allowing the Façade to properly handle clients’ requests.

The configuration of Façade supported multiple modes:

  • bank-only — In this mode, the Façade is just a reverse proxy, and every request is simply forwarded to the legacy service (Bank).
  • bank-primary-bk-secondary — In this mode, requests are proxied to Bank, but also forwarded to BK. If BK fails to process the request for any reason, the failure won’t affect the client’s request. The data for all reads still comes from Bank.
  • bk-primary-bank-secondary — In this mode, requests go to BK primarily, and if processing fails, the client will receive an error. Requests are forwarded to Bank as well, but if Bank fails, the client requests do not fail, they go to BK.
  • bk-only — In this mode, Bank doesn’t see any more data. It’s effectively 100% BK speaking in Bank API.

We orchestrated multiple deployments with several milestones until the final release, when Bank was no longer needed. These milestones were:

  • Release the Façade service deployed in bank-only mode, and verify that introducing this new hop does not create any issues.
  • Multiplex data to BK, deployed in bank-primary-bk-secondary mode. This allows us to have the two systems source their data from the same source, which is incredibly helpful in validating that the systems are equivalent.
  • Make BK primary, deployed in bk-primary-bank-secondary mode. At this milestone, we had decided that BK is the source of truth, and we kept Bank around in case of catastrophic failure (which never happened!)
  • Tests ran in bk-only mode, which enabled us to make sure that BK was getting hit as hard as possible by all of our automated tests, because ultimately this is what we care about the most.

At this point, I’ve only discussed the RPC compatibility layer, but it wasn’t this simple. Bank also contained an event layer, which hydrates our Data Warehouse and other downstream services. For this use case, we introduced a service called the “Façade Stream,” a Kafka consumer and producer, which translated the BK event stream into the Bank equivalent. Introducing the Façade Stream was difficult for a few reasons, but primarily because it required us to rehydrate data in different ways because of critical differences in the Bank and BK data models.

The output of BK’s façade stream and Bank’s event stream are meant to match very closely and the output of both systems landed in our common data warehouse, Snowflake, which enabled us to perform validation against these data streams at scale through SQL.

The Old Is New Again

A critical factor I have not yet mentioned is that all of the data in Bank also needed to be available in BK. To complicate things further, Bank and BK have fairly different data models. Well, that’s a little gremlin that reared its ugly head multiple times as we worked towards migrating the data from Bank and loading it into BK.

We wrote an application that we called the “Bootstrapper” to fetch data from Bank, map it into the BK equivalent, and load it into BK’s datastore. The process of “bootstrapping” is fairly complex because we were mapping Bank’s data model again into BK’s and it required handling many edge cases within that translation and re-enriching data to make it “whole” for BK to function properly.

BK’s core database is AWS Aurora (Postgres flavor). One of the many wonderful things about it is that it supports loading and unloading data to and from S3. The Bootstrapper leveraged this to perform large data operations within its migration workflows.

To put it simply, this process was extremely complex. We were overjoyed when we got to delete the code for this application. When migrating data, it’s critical to plan ahead, be ready to continuously discover edge cases, and factor in additional time, because it will take much longer than expected.

The Bootstrapper lived for a whole year, and we were making adjustments to get all the data correctly loaded into BK until the very end. Once we handled compatibility and proper data migration, we were ready for deployment, which I’ll cover in part 3.

Help & support

Get support

Contact

Please add your full name
Please add your work phone
Please add your company
Get in Touch ImageGet in Touch Image

Get in touch with our team