This functionality is available in a maven plugin.

To use it, simply run,

This will create a folder called src/main/auto-cluster-maven-plugin1234... which will have your code automatically organized by the clustering algorithm.

If you would like to use this structure instead of what you currently have, you can run with -DdryRun=false, and it will overwrite your current directories.

I would highly recommend you only disable dryRun if you use version control.

How to Benefit

This is a proof of concept, but it should be enough to test two hypotheses.

1.) By aggressively exposing the dependency structure of our code, we can identify and resolve mismatches between the way we think it works or the way we want it to work and how it actually works.

2.) By encoding our organization as an algorithm, we can have concrete discussions on improvements.

Both of these are based on you the developer seeing something that you think it wrong. If the new structure of the code doesn’t match what you want, then it is time to investigate.

If you investigate and discover that there are dependencies in the code that are skewing the results, then this plugin has succeeded in its goal. It successfully identified something you don’t want.

If you investigate and discover that the plugin is not producing the output that you want, then we can have a concrete conversation about what exactly you don’t like. The two main configurations we can change are the distance calculation and the clustering algorithm.

Luckily, both of these have already been extensively investigated in the statistics community, so there are options available. We just need to figure out which options capture the relationships we want.

Previous