Common GraphQL design mistakes

Peter Nycander
Peter Nycander

When I have consulted companies implementing GraphQL I tend to see the same design mistakes pop up quite often. This article will go through a few of the lessons I have learned after almost four years of GraphQL design, so that you don't have to repeat those mistakes.

Adding the field type: String

This might not seem obvious to everyone that this is a mistake, but it is a clear code smell. GraphQL is based around its typesystem, so using type: String instead of the built-in __typename just does not take advantage of that fact. What you want in most cases is an interface. Each possible value of type: String gets its own GraphQL type which implements that interface.

This will allow you to make a smarter schema, because odds are that not every field is shared among all the different types. Another benefit is that reading the schema will hint to the consumer what possible values to expect. The hinting can be achieved with an enum, but there is little to no downside to using the interface instead.

Mirroring the backend models

It is quite easy when starting with GraphQL to just expose all the fields that you get from the backend services or database models. The point of GraphQL is to let clients pick and choose from the data, right? Well kind of, but you should think before exposing too much data.

The best way is usually to go client first. Ask yourself the question: What do the consuming clients actually need? It could be that you have a Country object with fields such as countryCode, fullName, currency etc, but all the frontend needed was the image url and alt text to a flag of that country. The client would probably have to look at countryCode to generate a url, and hope they always match, and concatenate fullName with some other string to build the alt text. The server should probably just have exposed:

type CountryFlagImage {
  url: String!
  alt: String
}

This way of designing is often referred to as a demand-oriented schema. Demand-oriented schema design deserves its own article, because it is a fascinating subject. But the big reasons to use it is that 1. You will design a schema more aligned with the product rather than the database, and 2. this will decouple your database design from your GraphQL schema (which might useful when you change database design later on).

Limiting the API to suit your caching requirements

A normal way to deal with high volumes of traffic is to cache requests using HTTP caching, which can be leveraged in browsers and CDNs (Content Delivery Networks). To get a decent cache hit ratio you need to make sure that the resources are the same for every user. You would not want the request to get the users avatar or username to get cached and being served to the wrong user!

GraphQL is most commonly used through HTTP POST requests, which are usually not cached. However there are ways to send GraphQL requests using HTTP GET, either with the full query string or using persisted queries.

In a typical GraphQL schema you usually have a mix of private and public data. The public data is cachable, but the private is not. This creates a problem if you want to leverage HTTP caching. You can really only cache fully public data.

Optimizing for cache hits would however lead to trying to reuse the same exact query in multiple places, and instead of having private data like progressPercentage: Float of a type Movie you would have to have a separate query to get the progress, and try to stitch the data together after the fact. This is what we are used to doing with REST apis, and we are really not getting the full benefits of GraphQL by doing this.

I would recommend going as long as you can with just using caching inside of your server logic using something like Redis, and only consider HTTP caching as a last resort when designing your GraphQL API. There has been some progress in this field with products like GraphCDN, I have not tried it yet, but it could be worth for you to check out.

Not providing an id field

Frontend clients like Apollo or Relay rely on having a field to uniquely identify an object with. Those client leverage the id field, possibly together with the __typename of an object to create a normalized cache. Having a normalized cache is essential if you want to make sure that mutations update the data in all places it is being used.

A normalized cache means that every unique object is only stored once, and other object only store a reference to that normalized object instead of the complete thing.

Sometimes you are in the situation where you don't have a real id field to provide. I have found that in those cases there are usually a combination of other fields that uniquely references that object. Then you can concatenate those strings together and hash them to generate an id. It might feel dirty, but the point is not having a true id, it is to provide a stable identity of an object across queries and mutations to avoid stale UI.

If you don't provide an id, the data is considered part of the parent object. Which is sometimes exactly what you want, but you should default to always providing an id.

Creating reliance on parent objects

Sometimes you have data which is describing the relationship between two objects, like the access level of a user to an entity. There is no strict rule on how to model this in GraphQL. You could model it with a connecting type, such as:

enum AccessLevel {
  ADMIN
  READ_WRITE
  READ_ONLY
  NONE
}

type User {
  id: ID!
  entityRelation(id: ID!): EntityRelation!
}

type EntityRelation {
  accessLevel: AccessLevel!
  entity: Entity
}

type Entity {
  id: ID!
}

Another way is to skip the relation, and add a field to the Entity:

enum AccessLevel {
  ADMIN
  READ_WRITE
  READ_ONLY
  NONE
}

type User {
  id: ID!
  entity(id: ID!): Entity
}

type Entity {
  id: ID!
  accessLevel: AccessLevel!
}

Since you are resolving from the user type, you might think that it is already "in scope" of the query. However, with a normalized cache, which is the norm, this will mean that the access level is tied to the entity itself, and is therefore not describing the relationship to the user at all.

If you want to add the field to the Entity, I suggest adding a userId: ID parameter to the accessLevel field. Such as:

enum AccessLevel {
  ADMIN
  READ_WRITE
  READ_ONLY
  NONE
}

type User {
  id: ID!
  entity(id: ID!): Entity
}

type Entity {
  id: ID!
  accessLevel(userId: ID!): AccessLevel!
}

One would have to consider the potential for information leak in the implementation of the resolver, so that a user can't ask for the access level of any user id. But this is a totally valid schema design.

If done carefully, parent reliance could be considered using something like Local Context in GraphQL Java. But be careful when doing this so that the same type returns the same value when asked from different paths.