GitXplorerGitXplorer
S

MergeScenarioMiner

public
5 stars
0 forks
1 issues

Commits

List of commits on branch master.
Verified
a0a0f82229668b293cc70ab25438f38b03741428

Merge pull request #5 from Symbolk/dependabot/pip/gitpython-3.1.37

SSymbolk committed a year ago
Verified
9fcc0cea524755664fbc612d0460083aa39ed960

build(deps): bump gitpython from 3.1.35 to 3.1.37

ddependabot[bot] committed a year ago
Verified
178d056c1c1d4878fcbdaa3986e05fd5c67fbdb9

Merge pull request #4 from Symbolk/dependabot/pip/gitpython-3.1.35

SSymbolk committed a year ago
Verified
6997fa452908322fa323e9341e8e2da9ca08180d

build(deps): bump gitpython from 3.1.34 to 3.1.35

ddependabot[bot] committed a year ago
Verified
3b74c13bc992897ad487bdc04b1a3273d8a37aae

Merge pull request #3 from Symbolk/dependabot/pip/gitpython-3.1.34

SSymbolk committed a year ago
Verified
b3dc56c5a54b1901a49385241483b1f62aadabfe

build(deps): bump gitpython from 3.1.32 to 3.1.34

ddependabot[bot] committed a year ago

README

The README file for this repository.

MergeScenarioMiner

A mining tool to collect merge scenarios from Git repositories. In three-way merging, each merge scenario contains the two versions to be merged (called ours and theirs respectively), and their nearest common ancestor in the commit history (called base).

How to tell which version is ours and which is theirs?

See the following example:

$ git branch
      develop
    * master
$ git merge develop
$ git log
commit 5aa63defd7d552544348deaad88a22d212c43038 (HEAD -> master)
Merge: 011eeae bdd631b
Author: Symbolk <symbolk@163.com>
Date:   Sat Jul 6 17:12:25 2019 +0800

    Merge branch 'develop'

The current branch master is called ours, on which the merge commit will be submitted. The branch develop in the git-merge command is called theirs, which will not be affected by the merging.


Getting Started

Requirements

  • Windows /Linux/macOS
  • Python 3.7
  • Git 2.18.0
  • PyCharm

Installation

  1. Open the clone repository as a project with PyCharm;
  2. Under the root directory of the cloned repository, run the following command in the terminal:
pip install -r requirements.txt 

Usage

Usage 1: Collect all merge scenarios with merge conflict(s)

Collect Java files involved in merge scenarios that contain merge conflict(s) from the whole commit history.

Input

A Git repository with the name of the target branch (usually master).

Edit the main.py to set the necessary variables, then run it:

if __name__ == "__main__":
    repo_name = "cassandra"
    # get the default branch
    branch_name = "trunk"
    # if the repo is not present in the repo_dir, it will be cloned, but better to clone in advance
    git_url = "https://github.com/apache/cassandra"
    repo_dir = os.path.join(home, "coding/data/repos", repo_name)
    result_dir = os.path.join(home, "coding/data/merges", repo_name)

    # Usage1: Collect Java files involved in merge scenarios that contain merge conflict(s) from the whole commit history
    statistic_path = result_dir + "/statistics.csv"
    git_service = GitService(repo_name, git_url, repo_dir, branch_name, result_dir)
    git_service.collect_from_repo(statistic_path)

Output

During the mining process, the brief summary of each merge scenario with merge conflict(s) is printed in the Run console of PyCharm:

Cloning into 'D:\github\rep\javaparser'...
POST git-upload-pack (gzip 7425 to 3775 bytes)
remote: Enumerating objects: 47, done.        
remote: Counting objects: 100% (47/47), done.        
remote: Compressing objects: 100% (21/21), done.        
remote: Total 92708 (delta 10), reused 43 (delta 10), pack-reused 92661        
Receiving objects: 100% (92708/92708), 21.11 MiB | 830.00 KiB/s, done.
Resolving deltas: 100% (49120/49120), done.
Checking out files: 100% (2028/2028), done.
Ready to process repo: javaparser at branch: master
Commit: e6063bb10d6d41cb2b258540bb47edbd18b4646b, #Unmerged_blobs: 4, #Conflict java files: 2, #Conflict blocks: 2
Commit: 0258e273bfd2dca550a27d3204cf22227a41e772, #Unmerged_blobs: 5, #Conflict java files: 4, #Conflict blocks: 4
Commit: 25c4bbf796034c987e7517d4d3c596026a0142e3, #Unmerged_blobs: 16, #Conflict java files: 2, #Conflict blocks: 3
...

Collected data will be saved in the result_dir, which contains:

  1. Sub-folders named with merge commit ids, each of them contains conflicting Java files in that merge scenario.

    folders

    files

  2. A csv file that provides a statistical summary of each merge scenario, which consist of 4 commit ids (merge commit, HEAD of ours/theirs branch, base commit), the number of conflicting Java files and their paths, and the number of conflict blocks.

    summary

    In the column #conflict blocks, the numbers denotes the number of conflict blocks in every conflicting Java file. For example, in the first row, there are 5 conflicting Java files, the first file javaparser-core-serialization/src/main/java/com/github/javaparser/serialization/JavaParserJsonSerializer.java has 1 conflict block inside it.

Usage 2: Collect only merge scenarios with refactoring-related conflict(s)

Collect Java files involved in merge scenarios that contain refactoring-related merge conflict(s) from the csv file generated by https://github.com/Symbolk/RefConfMiner.git (python scripts to analyze MySql data generated by https://github.com/ualberta-smr/RefactoringsInMergeCommits).

Input

  1. A Git repository with the name of the target branch (usually the main branch, like master).
  2. The csv file that records refactoring-related merge commit ids, generated from the tool RefactoringsInMergeCommits (https://github.com/ualberta-smr/RefactoringsInMergeCommits).

Edit the main.py to set the necessary variables, then run it:

if __name__ == "__main__":
    repo_name = "cassandra"
    # get the default branch
    branch_name = "trunk"
    # if the repo is not present in the repo_dir, it will be cloned, but better to clone in advance
    git_url = "https://github.com/apache/cassandra"
    repo_dir = os.path.join(home, "coding/data/repos", repo_name)
    result_dir = os.path.join(home, "coding/data/merges", repo_name)

    # Usage2: Collect Java files involved in merge scenarios that contain refactoring-related merge conflict(s) 
    # from the csv file generated by https://github.com/Symbolk/RefConfMiner.git
    result_dir = os.path.join(home, "coding/data/ref_conflicts", repo_name)
    # csv_file = "merge_scenarios_involved_refactorings_test.csv"
    csv_file = os.path.join(home, "coding/data/merge_scenarios_involved_refactorings", repo_name + ".csv")
    git_service = GitService(repo_name, repo_dir, branch_name, result_dir)
    git_service.collect_from_csv(csv_file)

Output

The output is basically same with that in Usage 1.