Export the GitHub statistics of your organization contributors

Maxime Beaudry
3 min readMar 26, 2020

GitHub contributors statistics are often helpful to get the pulse of a GitHub repository. They make it possible to have a break down of added/removed lines of code and number of commits by contributors. Quite handy!

But when you are trying to get a higher level view and get the pulse of your entire GitHub organization, it becomes more complicated:

  • There’s no easy way to export the data.
  • You have to go in multiple repositories.

In my case, I work for a company that has more than 150 repositories. So, it is just not an option to go through all of them one by one.

The current article explains how to programmatically export the GitHub statistics for an entire organization in csv format. Once in a csv, you can massage it however you like in Excel.

Get ready

GitHub exposes a REST API to access these statistics. Instead of using this API directly, we will use the PowerShellForGitHub PowerShell module.

Don’t worry if you don’t have PowerShell installed. We will run the code using Docker. So there is no need for you to install PowerShell.

Let’s first start our PowerShell Docker container. Note that we mount some local directory inside the container. This will make it possible to write a csv. You will obviously need to replace SOME_LOCAL_DIRECTORY by a path on your local machine where the csv will be written.

docker run --rm -it -v SOME_LOCAL_DIRECTORY:/tmp/export/ mcr.microsoft.com/powershell

This will open a PowerShell Core console on Linux.

We will now issue a few commands to setup our shell. Pay attention to the comments on top of the Set-GitHubAuthentication call. When prompted, you will need to enter the good access token to get access to your github organization. You can generate an access token from here.

Get all the repositories of your organization

To get the statistics of all the contributors of an organization, we first need to get the list of the repositories of this organization. We do this through these commands. Be sure to properly set the value of the organizationName variable.

Note that I filter out some repositories that are too old. It is up to you to decide which repositories to include.

Get the contributors statistics

Now that we have the list of repositories, we can get the statistics of all the contributors of these repositories.

Export the statistics in csv

And finally, we will export these statistics in csv. Note that we change the format of the Date property from DateTime to string so that it is more Excel friendly.

Have fun with Excel

At this point, you have a file github_stats.csv that is available on your local machine (outside of Docker). Open this file in Excel to pivot it in multiple ways to extract statistics like:

  • what are the most active repositories?
  • who are the most active contributors?
  • how frequently is code being changed when you are approaching a code freeze?
  • is your team slowing down over time?

Obviously, all these statistics must be analyzed with judgment. Someone could have very good statistics but it can be simply because he his using a code generator. On the other hand, another contributor could have bad statistics but it is because he is working hard and meticulously on your legacy hydra code that reacts very badly to changes. So be sure to use these statistics with a grain of salt.

Happy stats!

--

--