In this post we will cover how to stream data from a database via HTTP in JSON format. Crystal allows us to build an implementation that does not stress the resources as much, by avoiding unnecessary memory allocations of intermediate object when possible. This can be accomplished thanks to the APIs of these modules: JSON, Http::Server::Response and DB.
Our goal is to build two HTTP endpoints:
GET /that will list all the tables in the DB
GET /:table_namethat will respond with the table content as JSON (NDJSON actually)
We will use Kemal to avoid dealing with routing manually.
The following steps will render a basic skeleton of the app consisting of two empty endpoints and a database access point:
- specify the database_url,
- open a connection pool,
- setup Kemal routing handlers,
- start Kemal server.
For convenience we’ll now focus on SQLite, but we’ll extend the solution to other drivers at the end of the article. Now is good time to check that your
shard.yml has at least:
Note: You can grab a quick sample database from chinook.
Get the tables
Let’s assume that a method
table_names(db) will return the names of the tables as an array of
String, then a JSON encoded result could be programmed as follows:
Usually a database does not have a huge amount of tables. So, holding an array of all the tables names does not sound too heavy. The
#to_json method will create a string with the JSON representation and send it to the client. We will cover more advanced techniques to generate JSON later in this post.
We can get the table names from SQLite with:
#query_all returns an array with all the rows of the query result. Friendly reminder: use it with caution because the array of all the returned rows could be quite large.
Get the rows
Let’s assume that a method
write_ndjson(io, col_names, rs) will write an NDJSON entry for the current row in the
io with the specified column names. We could program a response of the whole table as follows:
Before jumping into the details of how we could code
write_ndjson let’s analyze the HTTP response.
Usually the server will build the dynamic response in a memory buffer before sending it, placing headers before the actual content starts, such as the
Content-Length information. This is how the listing of the table names works, and you can see that for yourself with a
$ curl -v http://0.0.0.0:3000.
In this case, since each row is a self contained json, we could stream one row at a time and this will minimize the memory usage on the server side. The client will be able to process the data as it comes. This mechanism is called chunked response if you want to go into the details.
To allow the response to be streamed we need to flush the output buffer right after each row is encoded as json.
The next step is how to weave the data coming from the DB to the response IO without creating many intermediate structures. In some DB libraries the approach is to get an array or hash of the full row. Crystal-db is designed to provide access as straightforward and as fast as possible, without intermediaries or multiples objects. The data is already waiting to be read in the database connection. It has certain fixed order if the connection is via a socket (like in MySQL or PostgreSQL), but this is not a problem since we are building a JSON object.
JSON.build we start a JSON::Builder that will provide convenient methods to emit a JSON straight into the IO.
The database can return values of many types. Most of them can be encoded directly into JSON: String, Ints, etc. But some of them can’t, like
json_encode_field(json, col, value) is used to encode the value in an efficient way.
Note that this method streams a json array directly from
Bytes, yet there is no actual Array in memory. The
Bytes could be a huge BLOB so it would be desirable to avoid creating a temporal object just to encode it as JSON. The data is already in memory, let’s just use it.
It’s fascinating how many concepts can fit in less than 50 lines of code.
Don’t forget to check the full source code which can be used with any of the crystal-db supported drivers: MySQL, PostgreSQL or SQLite!