Improved graph computations with MariaDB
MariaDB is the database of choice for the backend systems powering some of our larger mobile applications. One reason we opted for MariaDB is its support for pluggable storage engines, which greatly expand its versatility. One of the more specialized storage engines available is OQGRAPH, which allows you to access your data as though it was a graph. While working with it we identified some opportunities for improvement and decided to contribute those improvements back to the official MariaDB releases.
The main purpose of databases is storing and querying data, but not all data is equal. General-purpose databases normalize on common scenarios to deliver good performance in most cases. However, if you know your data well and specifically optimize towards it, you can get a lot more out of a database. Special-purpose databases can deliver such optimized workflows depending on your needs. Better performance, more reliability or better compression of data, for instance.
MariaDB’s pluggable storage engines provide a way to get special-purpose database features into a general-purpose database system. One such pluggable storage engine that we use at M2mobi is OQGRAPH. Contrary to regular storage engines it does not actually store any data, but rather provides a different view on the data stored. In the case of OQGRAPH, this view represents your data as a graph.
We illustrate the above with an example. Let’s say you have a simplistic social network platform with users who can have relations with each other. Those relations would be stored in the database so you could easily retrieve, for example, a list of ‘friends’. But what if you want to know ‘friends of friends’? Or you want to know through which people two users are connected to each other?
This is where graph computation comes in. Using OQGRAPH you could create a graph view over the relationships between users and then run algorithms over that. For example, a ‘shortest path’ algorithm could give you the list of people connecting two users with each other. Using a ‘breadth first’ algorithm, you could look for all friends of friends of a user.
There exist special purpose systems for these kinds of computations already, but typically it involves talking to multiple different systems. The benefit of OQGRAPH is that you can talk to one system and issue a comparably simple query to get the information you want with less round-trips.
We used OQGRAPH in multiple projects and over time discovered some opportunities for improvement. The benefit of basing your systems on open source software is that you can actually look into them and see if you can implement those improvements yourself. We took that opportunity and, after some trial and error in internal testing, proposed these improvements as contributions back to MariaDB. Recently these contributions have been accepted and will be available as part of future releases of MariaDB.
To explain these improvements we have to refine our social network example a bit. OQGRAPH provides a directed graph. If you consider a “friend” relationship between user A and user B, this means there is a “friend” relationship from user A to B, and also one from user B to A. Let’s assume that when user A adds user B as a “friend”, it only creates the relationship from user A to B, but not the one from user B to A. (This is more or less how circles work in Google+). The effective outcome of this is that the list of friends of user A will be different from the list of users who have user A as a friend.
Making the latter query possible is the first improvement that we posted (actually a bug fix) and is already released as part of MariaDB 10.0.33, 10.1.29, 10.2.10 and 10.3.3.
The second improvement we worked on provides a new algorithm to OQGRAPH that speeds up looking for “leaves”. In our social network example this would be users that other users added as friends, but don’t have any friends themselves or users who have added friends, but no other users added them as friend back. Using the “leaves” algorithm you could quickly identify whether a user has a connection that would match either criteria.
This improvement was merged into MariaDB 10.3 and was released as part of MariaDB 10.3.3 as well.
While it took some time until our improvements were accepted, it was a great experience working with both the MariaDB and OQGRAPH teams and we’re looking forward to working with them again in the future.