One of the things I like the most is to parse code and to perform automatic operations on it. For this reason I started contributing to JavaParser and created a couple of related projects: java-symbol-solver and effectivejava.

As a contributor of JavaParser I read over and over some very similar questions about extracting information from Java source code. For this reason I thought that I could help providing some simple examples, just to get started with parsing Java code.

All the source code is available on Github: analyze-java-code-examples

Common code

When using JavaParser there are a bunch of operations we want typically to do every time. Often we want to operate on a whole project, so given a directory we would explore all the Java files. This class should help doing this:

package me.tomassetti.support;

import java.io.File;

public class DirExplorer {
    public interface FileHandler {
        void handle(int level, String path, File file);
    }

    public interface Filter {
        boolean interested(int level, String path, File file);
    }

    private FileHandler fileHandler;
    private Filter filter;

    public DirExplorer(Filter filter, FileHandler fileHandler) {
        this.filter = filter;
        this.fileHandler = fileHandler;
    }

    public void explore(File root) {
        explore(0, "", root);
    }

    private void explore(int level, String path, File file) {
        if (file.isDirectory()) {
            for (File child : file.listFiles()) {
                explore(level + 1, path + "/" + child.getName(), child);
            }
        } else {
            if (filter.interested(level, path, file)) {
                fileHandler.handle(level, path, file);
            }
        }
    }

}

For each Java file we want first to build an Abstract Syntax Tree (AST) for each Java file and then to navigate it. There are two main strategies to do so:

  1. use a visitor: this is the right strategy when you want to operate on specific types of AST nodes
  2. use a recursive iterator: this permits to process all sort of nodes

Visitors can be written extending classes included in JavaParser, while this is a simple node iterator:

package me.tomassetti.support;

import com.github.javaparser.ast.Node;

public class NodeIterator {
    public interface NodeHandler {
        boolean handle(Node node);
    }

    private NodeHandler nodeHandler;

    public NodeIterator(NodeHandler nodeHandler) {
        this.nodeHandler = nodeHandler;
    }

    public void explore(Node node) {
        if (nodeHandler.handle(node)) {
            for (Node child : node.getChildrenNodes()) {
                explore(child);
            }
        }
    }
}

Now let’s see how to use this code to solve some questions found on Stack Overflow.

How to extract the name of all classes in a normal String from java class?

Asked on Stack Overflow

This solution can be solved looking for the ClassOrInterfaceDeclaration nodes. Given we want a specific kind of node we can use a Visitor. Note that the VoidVisitorAdapter permits to pass an arbitrary argument. In this case we do not need that, so we specify the type Object and we just ignore it in our visit method.

package me.tomassetti.examples;

import com.github.javaparser.JavaParser;
import com.github.javaparser.ParseException;
import com.github.javaparser.ast.body.ClassOrInterfaceDeclaration;
import com.github.javaparser.ast.visitor.VoidVisitorAdapter;
import com.google.common.base.Strings;
import me.tomassetti.support.DirExplorer;

import java.io.File;
import java.io.IOException;

public class ListClassesExample {

    public static void listClasses(File projectDir) {
        new DirExplorer((level, path, file) -> path.endsWith(".java"), (level, path, file) -> {
            System.out.println(path);
            System.out.println(Strings.repeat("=", path.length()));
            try {
                new VoidVisitorAdapter<Object>() {
                    @Override
                    public void visit(ClassOrInterfaceDeclaration n, Object arg) {
                        super.visit(n, arg);
                        System.out.println(" * " + n.getName());
                    }
                }.visit(JavaParser.parse(file), null);
                System.out.println(); // empty line
            } catch (ParseException | IOException e) {
                new RuntimeException(e);
            }
        }).explore(projectDir);
    }

    public static void main(String[] args) {
        File projectDir = new File("source_to_parse/junit-master");
        listClasses(projectDir);
    }
}

We run the example on the source code of JUnit and we got this output:

/src/test/java/org/junit/internal/MethodSorterTest.java
=======================================================
 * DummySortWithoutAnnotation
 * Super
 * Sub
 * DummySortWithDefault
 * DummySortJvm
 * DummySortWithNameAsc
 * MethodSorterTest

/src/test/java/org/junit/internal/matchers/StacktracePrintingMatcherTest.java
=============================================================================
 * StacktracePrintingMatcherTest

/src/test/java/org/junit/internal/matchers/ThrowableCauseMatcherTest.java
=========================================================================
 * ThrowableCauseMatcherTest

... 
... many other lines follow

Is there any parser for Java code that could return the line numbers that compose a statement?

Asked on Stack Overflow

In this case I need to find all sort of statements. Now, there are several classes extending the Statement base class so I could use a visitor but I would need to write the same code in several visit methods, one for each subclass of Statement. In addition I want only to get the top level statements, not the statements inside it. For example, a for statement could contain several other statements. With our custom NodeIterator we can easily implement this logic.

package me.tomassetti.examples;

import com.github.javaparser.JavaParser;
import com.github.javaparser.ParseException;
import com.github.javaparser.ast.Node;
import com.github.javaparser.ast.stmt.Statement;
import com.google.common.base.Strings;
import me.tomassetti.support.DirExplorer;
import me.tomassetti.support.NodeIterator;

import java.io.File;
import java.io.IOException;

public class StatementsLinesExample {

    public static void statementsByLine(File projectDir) {
        new DirExplorer((level, path, file) -> path.endsWith(".java"), (level, path, file) -> {
            System.out.println(path);
            System.out.println(Strings.repeat("=", path.length()));
            try {
                new NodeIterator(new NodeIterator.NodeHandler() {
                    @Override
                    public boolean handle(Node node) {
                        if (node instanceof Statement) {
                            System.out.println(" [Lines " + node.getBeginLine() + " - " + node.getEndLine() + " ] " + node);
                            return false;
                        } else {
                            return true;
                        }
                    }
                }).explore(JavaParser.parse(file));
                System.out.println(); // empty line
            } catch (ParseException | IOException e) {
                new RuntimeException(e);
            }
        }).explore(projectDir);
    }

    public static void main(String[] args) {
        File projectDir = new File("source_to_parse/junit-master");
        statementsByLine(projectDir);
    }
}

And this is a portion of the output obtained running the program on the source code of JUnit.

/src/test/java/org/junit/internal/matchers/ThrowableCauseMatcherTest.java
=========================================================================
 [Lines 12 - 17 ] {
    NullPointerException expectedCause = new NullPointerException("expected");
    Exception actual = new Exception(expectedCause);
    assertThat(actual, hasCause(is(expectedCause)));
}

You could notice that the statement reported spans across 5, not 6 as reported (12..17 are 6 lines). This is because we are printing a cleaned version of the statement, removing whitelines, comments and formatting the code.

Extract methods calls from Java code

Asked on Stack Overflow

For extract method calls we can use again a Visitor, so this is pretty straightforward and fairly similar to the first example we have seen.

package me.tomassetti.examples;

import com.github.javaparser.JavaParser;
import com.github.javaparser.ParseException;
import com.github.javaparser.ast.expr.MethodCallExpr;
import com.github.javaparser.ast.visitor.VoidVisitorAdapter;
import com.google.common.base.Strings;
import me.tomassetti.support.DirExplorer;

import java.io.File;
import java.io.IOException;

public class MethodCallsExample {

    public static void listMethodCalls(File projectDir) {
        new DirExplorer((level, path, file) -> path.endsWith(".java"), (level, path, file) -> {
            System.out.println(path);
            System.out.println(Strings.repeat("=", path.length()));
            try {
                new VoidVisitorAdapter<Object>() {
                    @Override
                    public void visit(MethodCallExpr n, Object arg) {
                        super.visit(n, arg);
                        System.out.println(" [L " + n.getBeginLine() + "] " + n);
                    }
                }.visit(JavaParser.parse(file), null);
                System.out.println(); // empty line
            } catch (ParseException | IOException e) {
                new RuntimeException(e);
            }
        }).explore(projectDir);
    }

    public static void main(String[] args) {
        File projectDir = new File("source_to_parse/junit-master");
        listMethodCalls(projectDir);
    }
}

As you can see the solution is very similar to the one for listing classes.

/src/test/java/org/junit/internal/MethodSorterTest.java
=======================================================
 [L 58] MethodSorter.getDeclaredMethods(clazz)
 [L 64] m.isSynthetic()
 [L 65] m.toString()
 [L 65] clazz.getName()
 [L 65] m.toString().replace(clazz.getName() + '.', "")
 [L 65] names.add(m.toString().replace(clazz.getName() + '.', ""))
 [L 74] Arrays.asList(EPSILON, BETA, ALPHA, DELTA, GAMMA_VOID, GAMMA_BOOLEAN)
 [L 75] getDeclaredMethodNames(DummySortWithoutAnnotation.class)
 [L 76] assertEquals(expected, actual)
 [L 81] Arrays.asList(SUPER_METHOD)
 [L 82] getDeclaredMethodNames(Super.class)
 [L 83] assertEquals(expected, actual)
 [L 88] Arrays.asList(SUB_METHOD)
 [L 89] getDeclaredMethodNames(Sub.class)
 [L 90] assertEquals(expected, actual)
 [L 118] Arrays.asList(EPSILON, BETA, ALPHA, DELTA, GAMMA_VOID, GAMMA_BOOLEAN)
 [L 119] getDeclaredMethodNames(DummySortWithDefault.class)
 [L 120] assertEquals(expected, actual)
 [L 148] DummySortJvm.class.getDeclaredMethods()
 [L 149] MethodSorter.getDeclaredMethods(DummySortJvm.class)
 [L 150] assertArrayEquals(fromJvmWithSynthetics, sorted)
 [L 178] Arrays.asList(ALPHA, BETA, DELTA, EPSILON, GAMMA_VOID, GAMMA_BOOLEAN)
 [L 179] getDeclaredMethodNames(DummySortWithNameAsc.class)
 [L 180] assertEquals(expected, actual)

Next steps

You can answer a lot of questions with the approaches presented here: you navigate the AST, find the nodes you are interested into and get whatever information you are looking for. There are however a couple of other things we should look at: first of all how to transform the code. While extract information is great, refactoring is even more useful. Then for more advanced questions we need to resolve symbols using java-symbol-solver. For example:

  • looking at the AST we can find the name of a class, but not the list of interfaces it implements indirectly
  • when looking at a method invokation we can not easily find the declaration of that method. In which class or interface was it declared? Which of the different overloaded variants are we invoking?

We will look into that in the future. Hopefully these examples should help you getting started!

Get a free course on JavaParser

Directly to your email 5 lessons on JavaParser and the JavaSymbolSolver

Powered by ConvertKit