GraphQL In a Larger Company: A Success Story

Cover Image for GraphQL In a Larger Company: A Success Story

This is my retelling of how one of my previous employers (let’s call them “Company X”) adopted GraphQL to great success. Company X had hundreds of employees, with a sizable proportion of them in the development department. I will try to explain the rationale behind the move, how the developers thought at different stages and how the thinking changed.

The background

Company X had two products (after an acqusition) in the streaming space. Lets call them A and B. Both products had several different clients (TVs, web, mobile apps, marketing projects, etc). When I joined Company X they were using BFFs (backend-for-frontend) to fetch data from different microservices.

The BFFs lead to many developers having to implement the same functionality many times over. The work was also being done by client developers who neither wanted nor were good at writing efficient backend code.

The Chief developer at Company X wanted to address these issues by trying GraphQL, and did so successfully 2017 on the webb app of product A (with the most traffic). In hindsight, not a decision the Chief was happy with, as it crashed when an IRL event triggered many thousands of visitors to visit the web app at the same time. This downtime made national news the day after.

However, neither the Chief nor the management of Company X were deterred from continuing exploring GraphQL. It was introduced to a few more of product A’s apps, and they started to see productivity gains, design consistency and teams started to communicate more. The schema design forced collaboration (in a good way!).

This is the time where I joined Company X.

Summary of the migration to GraphQL

I joined the web team to work on product B (less traffic than A, but still a lot), where GraphQL had not been introduced at all. After the success of GraphQL in product A, Company X wanted to introduce it to product B as well.

As I mentioned before, GraphQL was replacing BFFs. BFFs were created by client developers, so in Company X it was a natural fit that client developers developed the GraphQL schema together. However, it turns out a lot of client developers actually want to develop clients, not servers (maybe that is why they are client developers 🤯). Us web developers were more used to writing some backend though, so most of the work on the GraphQL server ended up on us.

Migration of features started off with us in the web team moving the client code simultaneously as we were building the schema. When we were happy with what we wrote, we released it to production. Then we asked the other client teams to try to use what we wrote.

Sometimes the client teams could use whatever we made without any modifications, but sometimes they were limited by our schema design. That lead to either us iterating on the schema, or modifying the client design. It turns out a lot of the use cases they were asking for were just random differences in the apps that were not even desirable according to our design team.

During the migration work I think we made all mistakes you could think of. GraphQL was kind of new to the community, so we had to learn on the job. But this gave me the confidence to feel that I know what I am talking about when I say that GraphQL is very possible to evolve. We deprecated parts of the schema which were mistakes, monitored the usage of those fields, helped the teams to migrate when they needed it, and removed things.

All in all, the complete migration was done in less than a year, while we still kept up with the expectations from management to deliver new features.

Who owned the schema?

As I mentioned earlier, the Chief developer had the idea of all client teams collaborating on to the schema. But in reality, it did not turn out that way. The web team took almost all of the responsibility of the schema design. And not all web developers either, but those with a keen interest of GraphQL.

This made us help each other on the GraphQL designs of both Product A and B, since we were quite few.

Being few lead to a lower than ideal bus-factor of the GraphQL schema, but it also gave us insights how we could make different parts of the schema even more coherent and nice to use. Those design wins from keeping the schema design centered around a few individuals is what started my belief that distributing schema design (like Federation) is a mistake unless your company is heavily siloed.

Performance implications

GraphQL is slower than ad-hoc BFF endpoints, as GraphQL adds overhead you can just ignore in the BFF endpoints. Using DataLoaders (and asking backend teams to implement batch support when needed), tracing operation usage and implementing a caching layer inside the GraphQL server were all techniques used to mitigate the issue. We could achieve great response times which made requests be perceived as instant to our users.

To be able to handle the extreme spikes of traffic that especially product A experienced, we used serverless functions and CDN caching. CDN caching was a pain, because it forced us to split out private (like movie seen%) and public parts of requests, so that we could cache the public parts. This made a worse user-experience than it had to be, but the DX implications were even worse! Instead of just asking for movie { progress { percentage } } you had to make two requests and stitch the data together.

Nowadays I think you should try to avoid CDN caching for as long as possible for apps with private data, unless you have a good answer on how to handle the requests with mixed data.

Serverless functions were a success. We were limited to not using subscriptions, but trying to have an open socket connection to all of our users was so daunting that we pushed it to the future, and honestly we did not really have any good use cases for it either.

We considered serverless functions at the edge, but came to the conclusion that it would be slower than centralized functions, as we had too much communication with our centralized microservices which would be slow from an edge location.

Different clients, different schemas

We had two main products (A and B). They shared some microservices, but not all. We also made some minor products, which were integrated to both A and B. This lead to a lot of thinking of whether we should try to expose a single schema to support all products, or more focused schemas to cater to the different products.

To this day, I’m still not sure what the correct answer is. An API which is too generic will put a lot of responsibility on client developers (like what happens with REST APIs), but a too specific API might not be flexible enough to support future use cases. As always I think the answer is “it depends”.

Having different schemas gave us more power to control the implementation to the different products. Like product A could have some limitations to support more CDN caching, that was not necessary for product B. Product B was subscription based, which included data that made no sense for the free product A. Vice-versa product A had ad data which product B had no use for.

However, we had to implement some GraphQL logic multiple times. Some might think that we should have used stitching or federation to reuse logic, but we tried that, and hated everything about it. It made it harder to reason about, to deploy, to iterate on design, and to scale.

Concluding thoughts

Company X gave me a ton of insights about GraphQL, and since then I have worked for several other companies who use GraphQL. A lot of the problems are common across companies, but some of the advice I give new clients are:

  • Avoid distributing the schema design unless you have to
  • Avoid CDN caching unless you have to
  • Embrace how GraphQL resolving works. Use “DataLoaders”, not “DataSources
  • Use deprecations if you are not happy with the design
  • Monitor your schema usage
  • Load test your GraphQL servers
  • Measure GraphQL runtime performance

A lot of these insights have gone into the design of the GraphQL usage & monitoring product I’m making, Hubburu. Sign up here for free.

Peter Nycander
Peter Nycander