How to start reviewing code?

Published: 28 Oct 2021

Too often (me included), savvy code reviewers recommend to get started into code review by “Just reading code” and that is indeed the best way to get started. It is a simple answer but it lacks one extremely important detail that most people who have been reviewing code for a while often forget about:

What code should I start with?

The first step when getting into code review is to find a few good targets. This is critical as you don’t want to start too hard and be discouraged. You want to start with something simple and build up your confidence. Some people may be able to start hard and start with reviewing Kubernetes’ source code, but for the majority of people (me included), this is just paving the way to failure.

To build your understanding and to keep progressing without being frustrated, it is good to start with small snippets. Ideally really small snippets of vulnerable code like the ones PentesterLab provides in the Code Review badge. Try to start with your favourite language or the one you are the most confident in and build your confidence up.

Look at patches you can extract from looking at CVE.

If you don’t have access to PentesterLab snippets (and this is step 2 if you do), a good alternative is to look at patches you can extract from looking at CVE. You can for example follow security mailing lists of a few open source software you are using. I personally find the mailing list from the Apache foundation (covering all the Apache projects not just Apache httpd) and the Ruby-on-Rails security mailing-list to be great starting points. The Apache mailing-list will provide you with diversity both in terms of software, vulnerabilities and languages. The Ruby-on-Rails mailing-list will provide you extremely well documented issues making it easy to get started.

Once you are confident (or bored) with reviewing patches, you can move to libraries.

As opposed to complete software, libraries are often small codebases that try to solve only one problem. Another advantage of libraries is that there often is more than one library in a given language. This allows you to review multiple implementations of the same thing and compare them. Making it easier to see what checks may be missing in one implementation. Libraries for things like JWT (you can find a few here: https://jwt.io/libraries), Session management, File processing/upload… often make good first targets. You can then move to stronger targets like SAML or OAuth2 implementations once your confidence is up.

Once libraries have no secret for you and you want to pick harder targets, you can move to classic software. Reviewing the entire codebase for a software may quickly get frustrating, starting with some common features may make things easier as you are increasing the intensity of your study. For example, reviewing user registration, password reset, password storage, file uploads… instead of reviewing all the codebase and getting frustrated.

Then, instead of jumping to hard targets (Wordpress, PHPMyAdmin, Tomcat, Apache Httpd…), finding softer, less mature codebases may help you in your study. Codebases in the curated lists like “Awesome [Language]“ (for example “Awesome Golang”) are often a good place to start. Again, start with the language you feel the most familiar with (as a reviewer) and get started. Once you feel like you are competent at isolating vulnerable patterns in this language, you can move to codebases in other languages and apply those patterns.

How to measure success?

Finally, make sure you measure success properly. You cannot measure your progress by wether or not you discover vulnerabilities. There may be nothing to be found. Instead try to measure your level of understanding of the code, the quantity and complexity of patterns you know about and can isolate, how easy it is for you to approach a new codebase… In the same way, velocity is not a great indicator of progress. You can get fast at reading code but if you don’t understand it or miss something, you are not really improving. It is all about quality over quantity, especially when you are studying. A lot of skilled code reviewers will tell you that they usually spend hours reading the same 50 lines of code and that number of lines of code per hour does not matter.

And if you want to speed up your study, make sure you check out PentesterLab PRO and our code review badge!

Photo of Louis Nyffenegger
Written by Louis Nyffenegger
Founder and CEO @PentesterLab