How to Automate Git Change Log Report Creation with Python

0

Introduction

We software developers aim to automate work of other industries. But we often overlook how much of our own activities can be automated. There are few programmers who go to the extreme if automating any repetitive work that they need to do more than 3 times.

Though I will not suggest you to go to that extreme level, I would recommend automating any non programming related repetitive tasks as much as you can. Reporting is one of those tasks. As we developers become a bit senior, number of reports that we need to produce to our supervisors increases. These defects can be related to progress if the project/sprint, monitoring/approving code changes, monitoring defect movements etc.

In this article we will show you how we can create a formatted excel report with multiple sheets from Git command line output which is readable and presentable. Python is a powerful language with countless utility packages that can be used to automate almost any repeatable job of any nature.

Automation is good, so long as you know exactly where to put the machine.
– Eliyahu Goldratt

Before going to the code details, I will briefly explain the overall design of the program. This will help you better understand the steps followed to reach our objective.

If you are really interested in automating tasks using Python, I would highly recommend reading ‘Automate the Boring Stuff with Python‘ by Al Sweigart. This is the book that got my started with Python.

Design Board

Our goal is to create a one click automated excel report using Git command line tool to produce a report which helps us understand how many commits were made to the code base within a specified period, who changed which files and why. Following are the steps we are going to perform to reach our goal in creating the report:

  • Import necessary packages required for the program
  • Declare variables and constants
  • Dynamically generate report file name with appending current timestamp so that every time we run the program a new file is created without overwriting existing file
  • Change current directory to Git repository parent folder
  • Execute System command to invoke Git command to find all commits made from a certain input data till current date
  • Process git command line output stream to extract required information for our report
  • Write commit list to excel worksheet
  • For each commit, execute git command line to get list of changed files
  • Write all changed files for each commit to another sheet along with author, commit message etc.
  • Create a summary sheet to write file wise summary of changes with a commentary on who changed the file, when and why

Detailed Steps

Now that we have outlined high level steps required to achieve our goal, we will deep dive into each step with code examples on how this will work.

Import Necessary Packages

First step will be to import necessary packages. Python is so powerful because of the availability of so many utility packages just about anything you need. You just need to import the required packages by using Python package manager pip.

For our program we will use following packages:

os : we will use this package to change current directory

subprocess : we will use this package to invoke Git command line

datetime : this package will be used to append current timestamp with report file name so that a unique file name is generated every time we run the program.

openpyxl : we will this package to create the excel report.

Only openpyxl is a third party package which you will need to install separately. Other three packages are part of standard Python installation.

Declare Variable and Constants

Now we will declare the variables used in our program. This part is straight forward and self explanatory. Following are the variables used in our program:

#! Python3
import os,subprocess,datetime
from openpyxl import Workbook

#Declare Variables

GIT_REPO_ROOT_FOLDER = '/Users/macbookair/Documents/Developement/Python Projects/Automation'
REPORT_FOLDER = '/Users/macbookair/Documents/Developement/Source Code Change Log/'
COMMIT_LIST_FILE_NAME = 'Git_Commit_List_Python_Automation_'
GIT_CHANGE_LOG_FILE_NAME = 'Git_Commit_List_Python_Automation_'
COMMIT_LIST_DICT = {}
FILE_VS_COMMIT_MSG_DICT = {}

Dynamically Generate Unique File Name

Next step is to dynamically generate a new file name each time the program is run. We will do this by appending report folder, static part of the file name and current system timestamp. Following is the code snippet for this.

GIT_CHANGE_LOG_FILE_PATH = REPORT_FOLDER + GIT_CHANGE_LOG_FILE_NAME + datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '.xlsx'

Notice that we have used datetime package to get current timestamp and then formatted the output in ‘Year+Month+Date+Hour+Minute+Second’ format. After running the program I will show how the output looks like.

Set Current Directory

Now we will change current directory to point to Git repository folder for our project. This is required to execute Git commands to get commit list and changed file for each commit id.

#Change Directory to Git repository root folder
os.chdir(GIT_REPO_ROOT_FOLDER)

Prepare Excel Workbook

Next step is to prepare the format of excel workbook. Here we will create the worksheets in the workbook with sheet names, remove the default sheet created when workbook is created and also define column header for each sheet.

We have used Workbook class from openpyxl package for working with excel workbook. Please check the code example below. Python is a very readable language. You will be able to understand what each line of code does by simply going through the code.

#Create the excel workbook for report output
output_workbook = Workbook()
commit_list_sheet = output_workbook.create_sheet('Commit List')
detail_sheet = output_workbook.create_sheet('Detail')
file_wise_summary_sheet = output_workbook.create_sheet('File Wise Summary')
output_workbook.remove(output_workbook['Sheet'])

#Write excel sheet header for 'Commit List' sheet. This sheet will contain list of commits performed on the repository by all the developers.
commit_list_sheet.cell(row=1,column=1).value = 'Commit Id'
commit_list_sheet.cell(row=1,column=2).value = 'Author'
commit_list_sheet.cell(row=1,column=3).value = 'Date Modified'
commit_list_sheet.cell(row=1,column=4).value = 'Commit Message'
commit_list_sheet.cell(row=1,column=5).value = 'Change Category'

#Write excel sheet header for 'Detail' sheet. This sheet will also contain list of files modified against each commit performed.
detail_sheet.cell(row=1,column=1).value = 'Commit Id'
detail_sheet.cell(row=1,column=2).value = 'Author'
detail_sheet.cell(row=1,column=3).value = 'Date Modified'
detail_sheet.cell(row=1,column=4).value = 'Commit Message'
detail_sheet.cell(row=1,column=5).value = 'Changed File'
detail_sheet.cell(row=1,column=6).value = 'Change Category'

#Write excel sheet header for 'Summary' sheet. This sheet will contain change summary for each file changed.
file_wise_summary_sheet.cell(row=1,column=1).value = 'File Name'
file_wise_summary_sheet.cell(row=1,column=2).value = 'Change Summary'

We have created three sheets ‘Commit List’, ‘Detail’ and ‘File Wise Summary’ by calling create_sheet() method on Workbook class. Also we have removed the sheet created with name ‘Sheet’ by invoking remove() method.

First row of each sheet will contain column headers for the report. We have defined the column headers as you can see in the code example. We have used sheet.cell(row=ROW_NUMBER,column=COLUMN_NUMBER).value method to set value for a cell.

Invoke Git Command Line

Now that our excel work book is ready with three work sheets and columns headers, we need to invoke Git command line to get list of commits and list of changed files for each commit. For that we will use subprocess package provided by Python. This is similar to executing the commands from command line in Windows and terminal in Mac/Linus systems. Only difference being instead of getting the output on the screen we will get a character stream that we can process.

#Write the report based on command line output
commit_list_rows = 1

#Invoke a subprocess to execute Git command line to get change log/commit history
proc = subprocess.Popen(["git","--no-pager","log","--after=\"2019-12-6\"","--no-merges","--pretty=format:\"%H,%an,%cd,%s\""],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
stdout,stderr = proc.communicate()

communicate() method of subprocess class returns two streams. First is the command line output produced as a result of command execution. Second is the error stream if execution of the command is not successful and an error is encountered during execution.

Git command line is quite robust. Unless you are providing incorrect inputs, we will get proper output. The Git command we have executed above is to get a log of code changes made after 6th December 2019.

We have specified “—no-merges” option to exclude commit points automatically created by Git due to merge from other branches. We are only interested about actual code changes done for any feature improvement or bug fix.

We have also formatted the output using “—pretty=format” option so that only the information we need is part of command line output. Here %H indicates full commit hash, %an indicates author name and %cd indicates change date. If you are interested about more formatting options, please visit official Git documentation in below link

https://git-scm.com/docs/pretty-formats

Process Command Line Output and Write Report Data

Now that we have the list of commit ids with author name and change date, we can proceed to write the data in our report. We will need to execute another Git command later in the program to get the list of files modified as part of each commit made. Please check the code snippet below. I will explain how the code works.

#Read Git commit command output and write to 'Commit List' sheet
for line in stdout.decode('ascii').splitlines():
    commit_list_rows = commit_list_rows + 1
    cleaned_line = line.replace('"','')

    #Split fields of the line by ',' seperator
    fields = cleaned_line.split(sep=",")
    column = 0
    for field in fields:
        column = column + 1
        commit_list_sheet.cell(row=commit_list_rows,column=column).value = field

    #Store Git command output for a commit messages in a Dictionary where commit id is the 'Key'
    COMMIT_LIST_DICT[fields[0]] = fields

#For each commit id, execute Git command to get list of modified files corresponding to the commit id
detail_row = 1

for commit_key in COMMIT_LIST_DICT.keys():
    #Execute Git command to get list of files modified corresponding to a commit Id
    file_list_proc = subprocess.Popen(["git","diff-tree","--no-commit-id","--name-only","-r",COMMIT_LIST_DICT[commit_key][0]],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
    file_list_stdout,file_list_stderr = file_list_proc.communicate()

    for file_list_line in file_list_stdout.decode('ascii').splitlines():
        detail_row = detail_row + 1
        file_list_cleaned_line = file_list_line.replace('"','')

        #Put file name in dictionary along with commit message
        if file_list_cleaned_line in FILE_VS_COMMIT_MSG_DICT:
            message = FILE_VS_COMMIT_MSG_DICT[file_list_cleaned_line]
            message = message + "\n" + 'Change: ' + COMMIT_LIST_DICT[commit_key][3] + ' , Author: ' + COMMIT_LIST_DICT[commit_key][1] + ' , Date: ' + COMMIT_LIST_DICT[commit_key][2]
            FILE_VS_COMMIT_MSG_DICT[file_list_cleaned_line] = message
        else:
            message = 'Change: ' + COMMIT_LIST_DICT[commit_key][3] + ' , Author: ' + COMMIT_LIST_DICT[commit_key][1] + ' , Date: ' + COMMIT_LIST_DICT[commit_key][2]
            FILE_VS_COMMIT_MSG_DICT[file_list_cleaned_line] = message

        #Write information to detail sheet
        detail_sheet.cell(row=detail_row, column=1).value = COMMIT_LIST_DICT[commit_key][0]
        detail_sheet.cell(row=detail_row, column=2).value = COMMIT_LIST_DICT[commit_key][1]
        detail_sheet.cell(row=detail_row, column=3).value = COMMIT_LIST_DICT[commit_key][2]
        detail_sheet.cell(row=detail_row, column=4).value = COMMIT_LIST_DICT[commit_key][3]
        detail_sheet.cell(row=detail_row, column=5).value = file_list_cleaned_line


#Create summary sheet
summary_row = 1
for key in FILE_VS_COMMIT_MSG_DICT.keys():
    summary_row = summary_row + 1
    file_wise_summary_sheet.cell(row=summary_row, column=1).value = key
    file_wise_summary_sheet.cell(row=summary_row, column=2).value = FILE_VS_COMMIT_MSG_DICT[key]


#Save the workbook to report folder
output_workbook.save(GIT_CHANGE_LOG_FILE_PATH)
  • file_list_stdout returned from Popen.communicate() method call returns a byte stream. We have converted the byte stream to a character stream by invoking decode() method on the output stream.
  • The output character stream contains multiple lines one for each commit Id. We need to loop through each commit detail to do further processing. So, we have called splitlines() method on the character stream to get an array of strings from the list of commit messages each line containing details of one commit.
  • When the program is run on Windows, we get every line surrounded by double quotes. We don’t need this. So I have replaced the quote with blank space in file_list_line.replace(‘”‘,”)
  • For each line we have four values separated by comma; commit hash, author, commit date and commit message. We have to put these values in four columns in ‘Commit List’ sheet. So we have used cleaned_line.split(sep=“,”) method call to get comma separated fields in an array.
  • Now we have looped through the fields and set them to four columns in the loop. That way, for each line we are writing to a new row with four comma separated attributes from Git command line output to four columns.
  • Also we have added the array containing details for each commit id into a dictionary where key of the dictionary is the commit id which is unique for each commit and the array containing details for each commit. We will find this dictionary handy in the next section when we will create ‘Detail’ and ‘Summary’ sheet.
  • Now we will add data to the ‘Detail’ sheet. While list of commits done on a code base with author, time and brief message describing the summary of change is useful information, the report will be more informative if we can also provide which files were modified on each each commit performed. We will just do that in ‘Detail’ sheet.
  • One commit may contain one or multiple file changes. So, ‘Detail’ sheet will contain all the four columns of ‘Commit List’ sheet and an additional column for changed file name.
  • To get the list of files changed for a commit we have used “git diff-tree —no-commit-id —name-only -r COMMIT_HASH” Git command. Notice that we have provided attributes of Git command as an array to subprocess.Popen() method call.
  • Again we have decoded the output to ‘ascii’ and splitted the lines into an array with each element of the array containing one file within the commit id.
  • So from the ‘Detail’ sheet we will be able to find list of changes done on a file across multiple commits by simply filtering by file name.
  • When we are looping through the file list corresponding to a commit message, we are also creating the summary of changes made to a file by concatenating commit message, author and change date. Multiple changes to a file are put in separate lines.
  • We have achieved this by creating a dictionary where the key is the file name and the value is the commit message summary. When we are looping through the files changed in a commit id, we are checking if the file name is present in the dictionary. If present, we are getting message already stored in the dictionary corresponding to the file name. Then we are appending new commit summary to that value. If the file is not present in the dictionary then we are adding an entry with key as file name and value as commit summary.
  • After we have finished looping through all the commit ids, we will have commit summary for each file ready in the dictionary. We have simply looped through the keys in the FILE_VS_COMMIT_MSG_DICT dictionary and populated data in summary sheet.
  • Now we have written data in all three work sheets. So we have called output_workbook.save(GIT_CHANGE_LOG_FILE_PATH) method to save the file. We are done with our report.

Following are the screen shots of three sheets of the excel report after running the program.

Commit List

Commit List Sheet

Detail

Detail Sheet

Summary

Summary Sheet

Conclusion

This code example illustrates how powerful Python is and how easily we can create value by automating repetitive tasks. I hope this article will inspire you to learn Python and automate boring and repetitive tasks. Let me know in the comment section if you are interested in more similar articles.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Positive SSL
Exit mobile version