We will not include a tutorial of GraphQL here, the homepage itself has an excellent walkthrough of the specification, and we strongly suggest going through it to get a good understanding of it. The goal of this post is to share our impressions of the hackathon experience.
From Star Wars to Github
The canonical GraphQL sample schema exposes some data from the Star Wars movies and their characters, and most examples in the tutorial are based on it. Though useful for getting a first glimpse of the language, we quickly wanted to start experimenting with a richer model.
Github has released an early access of a GraphQL powered API, including a GraphiQL UI for running interactive queries on the spot. We found that GraphiQL works like a charm, and the query autocomplete based on the exposed schema is great for quickly navigating through the data graph. The Github GraphiQL explorer quickly became our playground for going through more and more complex examples of GraphQL usages, including mutations on the new GitHub Projects.
Implementing our own
After a nice stroll through the Github GraphQL schema, we decided to pick one of our applications and expose a simple endpoint for testing. Having a simple yet interesting model composed primarily of workers, workgroups and work entries Brium turned up to be a good candidate: we used the graphql-ruby gem for implementing the endpoint and defining the schema, and graphiql-rails to easily mount a GraphiQL interface for testing our implementation. The graphql-ruby-demo was quite helpful as well for guiding the implementation.
We managed to a get a simple implementation up and running in no time, retrieving workers and their work entries from the relational database. The Relay helpers included in
graphql-ruby made implementing pagination via connections a breeze (though we did face an incompatibility with Rails 3, that causes entire collections and associations to be loaded into memory).
It was clear, however, that the naive implementation was far from scalable.
GraphQL brings great flexibility to clients, allowing them to ask for exactly whatever data they need. And even though the server-side implementation to support the queries for each node of the graph is straightforward, a naive implementation of these may yield very inefficient executions as the client starts sending more complex queries on the graph. Even in an extremely simple example with only two related models (a worker has-many entries), performance issues from N+1 queries were already noticeable.
Though some GraphQL implementations suggest preloading associations to prevent this issue, it seemed that Facebook's dataloader was the response to this problem. This library provides batching and caching capabilities, so all requests to the storage layer that need to happen as the result of the evaluation of a GraphQL query can be resolved together. Since we were working in Ruby, we turned to Shopify's graphql-batch, which has a very similar philosophy to dataloader, using
RecordLoaders to load all records
IN a batched set of ids. This approach, with the addendum of a policy object to implement a basic authorisation layer (surprisingly overlooked in most example implementations), allowed us to solve the performance issues we were facing.
Still, we couldn't shake off the feeling that the GraphQL endpoint was opening the door for data requests potentially expensive to resolve. Ensuring that any request in a slightly complex data model (complex enough to be interesting, at least) could be answered in a performant way was next to impossible, due to the flexibility offered to the clients. Unlike REST, where we knew exactly what would happen on every endpoint, here optimising every possible query was simply not doable.
Some quick googling showed that we were not the only ones discussing this issue, and it seems that some implementers work with a pre-approved list of queries that can be sent by their clients, thus considerably reducing the cases to support. This approach, however, is only viable for internal APIs, and generic ones such as Github's must be able to handle anything thrown at them.
Deriving the schema
Another discussion that came up was what should be driving the design of the data shape exposed in the GraphQL API: either the requests from the clients or the underlying model of the application. An internal API, where queries can be pre-whitelisted, seems to be a better use case for the former, while a generic API for third party services such as Github's felt closer to the latter.
Considering the complexity involved in supporting arbitrary queries over the entirety of the model exposed, it was clear that our first experiments with GraphQL (when they occur) would be driven by concrete UI use cases. Unfortunately, we didn't get to test-drive any GraphQL-oriented client libraries, such as Relay, to test the benefits of a flexible API in a component-driven UI, or any other client library that could harness the power of having a strongly-typed API specification to early catch any communication errors
All in all, GraphQL seems a very interesting improvement over RESTful interfaces, allowing clients to fetch exactly the data needed and in a single roundtrip to the server, but at the expense of an increased complexity that needs to be handled server-side. Even though we hadn't experienced the benefits of implementing a Relay-powered UI with a GraphQL backend, the requirements imposed on the server seem difficult to compensate. Nevertheless, the GraphQL ecosystem feels very young and promising, and as more implementations continue to mature, we believe that the hiccups we faced today could have easy resolutions.