When a smartphone stores an address, we assume that we can find the address again using the right query. Just like all matching items in an online store can be listed using the right search terms. There’s just one problem: it’s not so much a matter of course as we believe. A logic error in the database management system (DBMS), which manages the stored information, can return incorrect responses without us even noticing the error.
A surprising number of errors in all systems
These types of logic error are surprisingly common. Or rather, they were until recently. Researchers at the Advanced Software Technologies Lab of ETH Zurich have now developed a tool that automatically detects logic errors in database software. It has already found over 450 programming defects in widely used database management systems.
“We were surprised ourselves at how many errors are hidden in popular database management systems,” Manuel Rigger explains. “For example, using our methods we discovered over 150 programming errors in the SQLite database systems, which runs on almost all smartphones and in many web browsers as well. And that’s by no means an extreme case. The rate is similar in all systems we’ve examined so far.”
Complex programs and hard-to-detect bugs
There are two main reasons why the ETH computer scientists are finding so many bugs. First, modern database management systems are very complex and can comprise several million lines of code. This makes it practically unavoidable that programming errors will occur. Second, logic errors are very difficult to find. In contrast to bugs that cause the system to crash, logic errors rarely have an obvious effect. Usually a comparison with the correct result is necessary to detect them. Up until now, that has only been possible with a lot of manual effort.
The only automatic test procedure actually used in practice was designed over 20 years ago. It compares queries in different database systems. Since every manufacturer makes specific expansions and adaptations to the query standard SQL (Structured Query Language), it is only possible to formulate unambiguously comparable queries for a small area of the systems. That’s because the query languages of manufacturers sometimes differ substantially, much like the dialects of a human language.
Two weeks of work lost
However, Rigger personally thinks there’s a third reason why there has not been more determination to track down errors thus far: in the field of database management systems, work is currently focused on making systems more powerful using artificial intelligence methods like machine learning. Troubleshooting naturally seems less attractive in comparison.
Rigger’s own background is in software testing and the development of compilers which translate various programming languages to machine languages. He became aware of the problem of errors in database management systems through his own frustrating experience when he lost two weeks of work because of an error in a widely-used database management system.
Three methods for finding three types of error
That’s when the testing specialist decided to tackle the problem, together with the head of the Advanced Software Technologies Lab, Zhendong Su. Their solution consists of three different methods that can reliably compare queries. In “query partitioning”, queries are automatically decomposed into multiple parts and checked to verify that the whole is the same as the sum of the parts.
The second method targets errors caused by optimisation mechanisms in database management systems. Queries are modified so that the optimisation mechanism can no longer.
In contrast to the first two methods, which can be executed automatically even without detailed knowledge of the database management system in question, the third requires specific knowledge of the database management system and individual implementation of the comparison. In “Pivoted Query Synthesis”, a whole row in the relational database system table is selected at random. A query must then be generated that results in the selected row. This allows logic errors to be identified much more efficiently than analysing individual entries by hand, as has been the practice up until now.
Tool already in use by manufacturers
The ETH computer scientists appear to have stirred up dormant enthusiasm with their methods and the SQLancer tool, which automates them. There has been significant interest among database systems manufacturers. Although the three procedures have only been communicated informally as preprints and via social media so far and are only now being officially presented at the relevant technical conferences, multiple manufacturers have already started using them to test their software. “A variety of manufacturers have already found numerous bugs in their systems and have integrated SQLancer into their testing,” Rigger says, having heard from many direct contacts. “A few have also contributed code to our project, which is open source, or have adapted and expanded the tool according to their specific needs.”
“Pivot query synthesis” was presented to the research community on 5 November at the systems software conference OSDI (Operating Systems Design and Implementation). The optimisation error method will be presented on 13 November at the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), and “query partitioning” on 18 November at the programming languages conference OOPLSA (Object-Oriented Programming, Systems, Languages, and Applications).