API Monitoring for GraphQL

Monitoring of APIs has been best practice for production environments for a long time. Tools like New Relic, Datadog and Splunk have become popular in this category. In the world of REST APIs, the monitoring data is usually grouped by HTTP endpoints and their verbs. The information you can find are things like how many times the endpoints have been called, their execution time and request metadata. Fancier things like performance tracing and request statistics (for example p95 execution time) are usually also available.

In a typical GraphQL setup, you will only have one HTTP endpoint: /graphql. Hooking up one of these tools will group all calls to a single entry by default, which is close to useless. You can usually group by other parameters, such as OperationName of GraphQL operations which makes it better. Compared to REST endpoints however, you can’t know the shape of the data from the OperationName. Without knowing what data is being sent over the wire, you can’t remove deprecated parts of the API, nor can you really get any insights how the different resolvers behave in different contexts.

Monitoring and Versioning

In REST APIs, you usually handle versioning by deprecating an endpoint, and create a new version of that endpoint. Once usage has stopped of the old endpoint, you can remove it. In GraphQL, you don’t deprecate endpoints or OperationName (that is set by the client to any value!). You deprecate fields and enums (input fields and object types do not have official deprecation support yet 💡). One fact is the same from REST APIs, you need to know when usage has stopped.

Monitoring when usage has stopped means that you need to know every time someone uses the deprecated part of the schema, at least if you care about potentially breaking the clients that are in production. You could do this by logging in your resolver logic, but that can be a tedious process. If your GraphQL API only serves a single web client, you might be able to skip monitoring usage as you can run local commands to check for usage. Being the web; you will be fine as soon as every client refreshes the page as they will get the latest version of your javascript files.

But if you have more clients, searching for usage locally is harder. Some clients, especially iOS and android apps, might create difficulty in knowing if everyone is on the newest version. You need to know if the updating cycle has completed so that you can continue with your removal. This is where Hubburu comes in. Hubburu is a tool to monitor all usage of your schema which is hard to know from a server side perspective. Things like incoming enum values, and the shape of the operations that come in from clients. Hubburu hooks into the server GraphQL runtime as a plugin, and sends reports back to our servers about what parts of the schema is being used.

Monitoring is essential for feedback

If you design your GraphQL API without any frontend expertise, or without any insights into possible use cases, it can be hard to know if what you design is easy to use. You might fall back to exposing the database structure translated to GraphQL. That is often a good way to go, but far from always.

One example I have seen in the wild was to label an entity in a UI as being “in use” by other entities. In the database that meant the existence of entries in one of many join tables. Exposing those all the available relationships would lead to a very clunky API. You would put the burden of knowing all the different ways the entity is in use on the consumer. If you would add another relationship later on, the clients all need to be updated to reflect the new “in use” meaning. A better way to model that would be an inUse boolean.

However, it could very well be that the consuming clients also need to reference the actual entities to display them in a list. In that case, an inUse boolean would not help them at all. If so, having both options would be the best design. As a backend developer, it can be hard to get insights into how your API is actually being used in the real world. By looking at the shape of the resulting queries that clients are sending, and ideally also observing the UI being rendered from that data, you can see how your API can be improved. This can probably lead to smaller payloads and possibilities to optimize backend code.

With a tool like Hubburu or Apollo Studio, you can see all operations being sent, and look for odd looking queries. These queries can be a good starting point to improve your API.

Peter Nycander
Peter Nycander