Examining Module Dependencies

The rest of this series will focus on in-project module dependencies. They are both simple to experiment with and complex enough to demonstrate interesting results.

ClassGraph

I’ve been down this road once before and stumbled on the ClassGraph tool. From the description,

ClassGraph is an uber-fast parallelized classpath scanner and module scanner for Java, Scala, Kotlin and other JVM languages.

ClassGraph not only helps us extract the data, but it also generates it in the dot format. Using GraphViz, we can easily get some visuals up and running.

The basic setup looks like this,

This creates a ClassGraph for anything under java.util on the classpath. It then extracts all the dependency information and dumps it to a dot file.

For our uses, we will be changing the basePackage from java.util to whatever a given project uses as a root for its classes.

ClassGraph on ClassGraph

As a demonstration, we can run ClassGraph against its own codebase. The code is structured to have two root packages, so we need to update basePackages to include both of them.

Running this with some code that writes the graphDot string to a file called cluster.dot, we can generate an svg.

And then open the resulting svg in a browser:

Dependencies graphed using GraphVis’s dot algorithm
Dependencies graphed using GraphVis’s dot algorithm

The resulting image is not very useful. I’ve successfully used it in the past, but I 1) knew the source code already and 2) was looking for very specific connections.

However, we can at least try to draw some conclusions from looking at the image. In no particular order,

Zoom

No matter what level of zoom I pick, it is difficult to find the right balance between seeing the individual classes and seeing the overall structure. Finding that balance is one of the main goals of this exercise, so let’s keep looking.

Organization

Just from looking at this image, I can’t tell you how well ClassGraph is organized. I can tell you that there is some sort of organzation though. Each line represents a dependency, so we can get a sense of the organization based on how many lines we see.

Some areas are dense with lots of lines between the classes. These groups of classes are highly inter-dependent, so likely relate to each other.

Some classes have hundreds of incoming lines, but not many outgoing. These are usually utility classes shared by all the different parts of the code.

After excluding those, the remaining classes are a mix. They might be generic code shared by many of the smaller clusters. Or they might be wrappers that combine the functionality of the smaller clusters.

If there was no organization, classes would depend on each other at random. You wouldn’t have clusters of related code. You wouldn’t have utility classes that are shared across most of the codebase.

Again, this isn’t a measure of how organized ClassGraph is, it is simply the idea that some part of the organization can be expressed in an image.

Author’s Intentions

As mentioned earlier, the source code of ClassGraph is broken up into two top-level packages: io.github.classgraph and nonapi.io.github.classgraph.

If you search for nonapi in the svg and choose to “Highlight All” matches, you’ll see that they image (therefore also the dependencies) doesn’t really capture that division. There are certainly areas that are mainly nonapi and areas that don’t have nonapi, but the division between them is not clear in the image.

There are a few possible reasons why this might be the case.

  1. The division may not exist in the code structure. There is a clear distinction between the file package names and locations, but that doesn’t enforce a division in the code. The classes in io.github.classgraph might use classes from nonapi.io.github.classgraph and vice versa.

  2. The division does exist in the code structure, but the image layout algorithm does not make that clear. In this case, dot tries to minimize line lengths and overlaps. But we might be willing to make the image larger if it more effectively showed the divisions that we wanted.

Not Shaped Like the Filesystem

No matter how complex our dependency structure is, we eventually have to map it back to files in the filesystem. We must organize our code in a basic file and folder hierarchy1.

This image does not easily map back to files in the filesystem. So any information we extract from the image must be translated before we apply it to the filesystem.

Preparing for Analysis

Next time, we will try to improve on some of the issues above. But if we are going to look at other projects, we want it to be easy to extract this data.

Maven’s plugin architecture makes it simple to wrap up our ClassGraph code into a mojo that we can run on any Java project with a pom.xml. After wrapping the code above into a plugin, we can generate our dot file with,

To handle the case where classgraph has multiple basePackages, we can pass them as parameters,

This will compile the code and dump a cluster.dot for us to start playing with.


  1. While soft and hard-links are an option, I’ve never seen anyone use them to help with code organization. A more flexible filesystem would be an interesting direction to explore.

Previous Next