Clustering GitHub Commits


Today was exciting! Matt has been extracting statistics from 2,629 GitHub commits and I’ve been doing some feature engineering on the data. Today I did a K-Means clustering that is simply beautiful:

https://wiki.deadmandao.com/index.php/W3HN_Iteration_Two#First_Clustering

The commits were from projects that use Rust and React as their core languages. The computer created 5 clearly meaningful clusters (and one cluster I haven’t figured out yet).

We will use the clustering as an aid for manual labeling on some example commits to use in training a classifier.


Leave a Reply

Your email address will not be published.