The decisive advantage that distinguishes the ETH spin-off DeepCode is that it has developed the first AI system that can learn from billions of program codes quickly, enabling AI-based detection of security and reliability code issues. DeepCode is an excellent example of a modern AI system that can learn from data, program codes in this case, yet remain transparent and interpretable for humans.
By joining Snyk, a security tools company valued at 2.6 Billion US Dollars, that enables developers to quickly find code vulnerabilities, DeepCode will have the opportunity to integrate its AI-based capabilities into existing Snyk products, in turn moving closer to its original goal of impacting millions of users worldwide. The financial terms of the agreement have not been disclosed.
Learning from data makes the difference
DeepCode was originally launched with the goal of creating the first AI-powered code analysis platform. The key motivation was that over the last years, developers had produced billions of lines of code, freely available in a number of public repositories, together with corresponding bug reports, fixes and other code-related information. The key idea then was to build an AI system that can learn from this new type of data (termed Big Code) and can solve various pressing code quality problems as well as detecting unknown security flaws in programs.
The differentiating factor is that unlike prior code analysis engines that require manual, brittle, handwritten rules, DeepCode is based on learning from data: it automatically processes all code related information and builds predictive models that can be used to detect many more flaws and with accuracy beyond the reach of other commercial systems.
Further, DeepCode's models are interpretable, meaning that a human can examine the model and introduce changes if needed, a capability beyond any existing modern deep learning models. This makes DeepCode an instance of a third-generation AI system: it can learn from data (code in this case) yet is human-interpretable. Furthermore, DeepCode has made algorithmic advances that made this AI not only more capable than conventional tools, but also orders of magnitudes faster.
From basic research to the market
While DeepCode was founded in 2016, the research area itself was pioneered at ETH Zurich in 2013 when Veselin Raychev, then a doctoral student of Vechev, and Martin Vechev, Professor at the Secure, Reliable and Intelligent Systems Lab of the Department of Computer Science, together with collaborators, laid the grounds. They built the first prototypes of AI-based systems that could learn from code by showing how to combine data-driven machine learning methods with semantic static code analysis methods based on symbolic reasoning.
Interestingly, at the time, it was not clear how to connect these two types of methods as traditionally they belong to rather distinct areas of computer science. The observation here was that by finding ways to connect these seemingly separate areas, it was possible to build new types of AI Systems that could effectively process code (which offers different challenges than other kinds of popular data, such as images, videos, natural language, etc.).
During this period, and supported by an ERC Starting Grant of Vechev, the ETH group released several pioneering works and public AI-based programming systems for various software tasks (e.g. de-obfuscation which means "unveiling" a program code to make it understandable: cf. jsnice.org), still heavily used today by thousands of users.
An AI-based code analysis system for developers
For his pioneering work on learning from Big Code, Veselin Raychev received the ETH medal for an outstanding PhD thesis as well as the prestigious ACM Doctoral Dissertation Award, Honorable Mention (top 3 PhD dissertation in computer science, worldwide). This makes him only the third European PhD graduate in the 40-year history of the award and the first PhD graduate from ETH Zurich to receive such an award.
Having pioneered the research area of learning from Big Code, a natural next step was to build an AI-based code analysis system that worked in production and at scale, with the goal of having it used by every developer and every company that creates software. This led to the birth of DeepCode, which was co-founded by Veselin Raychev (CTO), Boris Paskalev (CEO), and ETH Professor Martin Vechev and currently protects more than 4 Million contributing developers and over 100,000 repositories subscribed for the DeepCode’s service.