Examining Module Dependencies
The rest of this series will focus on in-project module dependencies. They are both simple to experiment with and complex enough to demonstrate interesting results.
ClassGraph
I’ve been down this road once before and stumbled on the ClassGraph tool. From the description,
ClassGraph is an uber-fast parallelized classpath scanner and module scanner for Java, Scala, Kotlin and other JVM languages.
ClassGraph not only helps us extract the data, but it also generates it in the dot
format. Using GraphViz, we can easily get some visuals up and running.
The basic setup looks like this,
// Packages to scan
String[] basePackages = new String[]{"java.util"};
ClassGraph graph = new ClassGraph();
graph.enableAllInfo();
graph.enableInterClassDependencies();
graph.acceptPackages(basePackages);
ScanResult result = graph.scan();
ClassInfoList infoList = result.getAllClasses();
String graphDot = infoList.generateGraphVizDotFileFromInterClassDependencies();
This creates a ClassGraph
for anything under java.util
on the classpath. It then extracts all the dependency information and dumps it to a dot
file.
For our uses, we will be changing the basePackage
from java.util
to whatever a given project uses as a root for its classes.
ClassGraph on ClassGraph
As a demonstration, we can run ClassGraph against its own codebase. The code is structured to have two root packages, so we need to update basePackages
to include both of them.
Running this with some code that writes the graphDot
string to a file called cluster.dot
, we can generate an svg.
And then open the resulting svg in a browser:
The resulting image is not very useful. I’ve successfully used it in the past, but I 1) knew the source code already and 2) was looking for very specific connections.
However, we can at least try to draw some conclusions from looking at the image. In no particular order,
Zoom
No matter what level of zoom I pick, it is difficult to find the right balance between seeing the individual classes and seeing the overall structure. Finding that balance is one of the main goals of this exercise, so let’s keep looking.
Organization
Just from looking at this image, I can’t tell you how well ClassGraph is organized. I can tell you that there is some sort of organzation though. Each line represents a dependency, so we can get a sense of the organization based on how many lines we see.
Some areas are dense with lots of lines between the classes. These groups of classes are highly inter-dependent, so likely relate to each other.
Some classes have hundreds of incoming lines, but not many outgoing. These are usually utility classes shared by all the different parts of the code.
After excluding those, the remaining classes are a mix. They might be generic code shared by many of the smaller clusters. Or they might be wrappers that combine the functionality of the smaller clusters.
If there was no organization, classes would depend on each other at random. You wouldn’t have clusters of related code. You wouldn’t have utility classes that are shared across most of the codebase.
Again, this isn’t a measure of how organized ClassGraph is, it is simply the idea that some part of the organization can be expressed in an image.
Author’s Intentions
As mentioned earlier, the source code of ClassGraph
is broken up into two top-level packages: io.github.classgraph
and nonapi.io.github.classgraph
.
If you search for nonapi
in the svg and choose to “Highlight All” matches, you’ll see that they image (therefore also the dependencies) doesn’t really capture that division. There are certainly areas that are mainly nonapi
and areas that don’t have nonapi
, but the division between them is not clear in the image.
There are a few possible reasons why this might be the case.
The division may not exist in the code structure. There is a clear distinction between the file package names and locations, but that doesn’t enforce a division in the code. The classes in
io.github.classgraph
might use classes fromnonapi.io.github.classgraph
and vice versa.The division does exist in the code structure, but the image layout algorithm does not make that clear. In this case,
dot
tries to minimize line lengths and overlaps. But we might be willing to make the image larger if it more effectively showed the divisions that we wanted.
Not Shaped Like the Filesystem
No matter how complex our dependency structure is, we eventually have to map it back to files in the filesystem. We must organize our code in a basic file and folder hierarchy1.
This image does not easily map back to files in the filesystem. So any information we extract from the image must be translated before we apply it to the filesystem.
Preparing for Analysis
Next time, we will try to improve on some of the issues above. But if we are going to look at other projects, we want it to be easy to extract this data.
Maven’s plugin architecture makes it simple to wrap up our ClassGraph
code into a mojo
that we can run on any Java project with a pom.xml
. After wrapping the code above into a plugin, we can generate our dot file with,
To handle the case where classgraph has multiple basePackages
, we can pass them as parameters,
This will compile the code and dump a cluster.dot
for us to start playing with.
While soft and hard-links are an option, I’ve never seen anyone use them to help with code organization. A more flexible filesystem would be an interesting direction to explore.↩