6

I have *.test.* files inside my src folder and I wanna find authors who contributed the most. I came up with command below

find ./src/ -name "*.test.*" -print0 | xargs git --no-pager blame -0 -p | grep "^author " | uniq -c

but unfortunately, for some reason it works only for the first file. How can I run that command for all files found by find? Or do I have to somehow "collect" results of git command execution

I've googled that xargs has -L argument, but it didn't help. What am I doing wrong?

Expected result should be like output below

2 author Cat Tom
1 author Mouse Jerry
1 author Sponge Bob
2
  • 1
    Have a look at git blame -h: The help tells you that blame accepts only one filename. Commented May 25, 2023 at 11:47
  • you got your contribution calculation wrong. see UPDATE section in my answer Commented May 25, 2023 at 14:57

2 Answers 2

6
find ./src/ -name "*.test.*" -print0 |
    xargs -0 -n1 git --no-pager blame -p |
    grep "^author " | uniq -c

-0 is an argument for xargs so it must be passed to xargs not to git blame.

git blame works one file at a time so make xargs to pass one file using -n1.

Sign up to request clarification or add additional context in comments.

Comments

4

Did some modifications:

  • xargs -n1 to execute line-by-line
  • we don't need --no-pager since git knows that it's piped
  • deleted author with grep -Po "(?<=^author ).*"
  • add sort before uniq because uniq counts only consecutive lines.
  • add sort -nr after uniq to sort contributors in reversed order.
  • add line numbers to see an author's rank with nl.
  • and finally with 400 files it takes 8 seconds. so added xargs -P0 to run git for each file in parallel and it took 1 second.
$ time find ./src/ -name "*.twig" -print0 | xargs -0 -n1 -P0 git blame -p | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
     1:     543 Twist
     2:     273 chuzhaikinadv
     3:     239 Anton
     4:     225 zayceva
     5:     204 Natalia Baganova
     6:     113 Nastie Deminka
     7:     103 Lakhaev Andrey
     8:      79 sergey.ivanov
     9:      72 alnidok
    10:      70 Kalchenko Ilia
    11:      41 Andrey Klimenko
    12:      38 a.dadaev
    13:      30 dilya
    14:      20 a.kaledin
    15:      17 George
    16:      16 Svetoslav Onosov
    17:      14 Andrey Smirnov
    18:      13 Mustafaeva Dilya
    19:      13 George Barlukov
    20:      10 andrewsmirnov
    21:       7 silentmantra
    22:       6 Sergey
    23:       6 Alexander Nenashev
    24:       5 Задорожний Александр
    25:       5 Dilya
    26:       4 over_ilaj
    27:       4 egoprimary
    28:       4 Dilya Mustafaeva
    29:       3 Zharova Yaroslava
    30:       3 Nastie
    31:       3 Irina Demchenko
    32:       2 Александра Храпкова
    33:       2 Vadim
    34:       2 Sharipova
    35:       2 LaFut
    36:       2 Kozyreva
    37:       2 Andrey Lakhaev
    38:       1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1
    39:       1 DESKTOP-0N4O88L\dev
    40:       1 Andrey Kuznetsov
    41:       1 Alexander Zadoroznyi

real    0m0.917s
user    0m10.946s
sys     0m3.047s

UPDATE
git blame -p counts only commits by an author. Imho it's not a real contribution. Since we are using git blame I suppose we are interested in LINE CHANGES. Luckily git blame has --line-porcelain what exactly we need to count line changes by an author.
Ok, but in my case xargs failed to deliver such big data in consistent manner and every execution the result was different. Oops, we have problem even with counting commits! I believe this line from the first example with xargs is wrong!

38:       1 e757c3db6bfaf171b1c6aa51b3d9798c605a51a8 1 1 1

I guess the problem is xargs' buffering thus providing grep with broken lines. The problem is described here and the solution at first glance doesnt seem elegant: xargs output buffering -P parallel

Installed GNU Parallel and launched the command, it's consistent and fast enough!
So first:

Here goes a vote against xargs in favor of gnu parallel...

And the results. As you can see Natalia Baganova takes the 2nd place though counting commits only gives here only the 5th. So that's real contribution opposed to just counting commits:

$ $ time find ./src/ -name "*.twig" -print0 | parallel -0 git blame --line-porcelain | grep -Po "(?<=^author ).*" | sort | uniq -c | sort -nr | nl -s': '
     1:    4996 Twist
     2:    4121 Natalia Baganova
     3:    2771 zayceva
     4:    2405 Anton
     5:    2361 chuzhaikinadv
     6:    1113 George
     7:    1081 Nastie Deminka
     8:     750 Kalchenko Ilia
     9:     712 alnidok
    10:     516 Lakhaev Andrey
    11:     383 Andrey Smirnov
    12:     365 dilya
    13:     325 a.dadaev
    14:     301 Andrey Klimenko
    15:     291 sergey.ivanov
    16:     134 Задорожний Александр
    17:     124 George Barlukov
    18:     116 Sergey
    19:      59 egoprimary
    20:      43 a.kaledin
    21:      42 Mustafaeva Dilya
    22:      41 Svetoslav Onosov
    23:      38 Alexander Nenashev
    24:      26 andrewsmirnov
    25:      25 Nastie
    26:      20 silentmantra
    27:      18 Zharova Yaroslava
    28:      16 Dilya Mustafaeva
    29:      15 Andrey Kuznetsov
    30:       9 over_ilaj
    31:       9 Dilya
    32:       6 Sharipova
    33:       6 LaFut
    34:       3 Александра Храпкова
    35:       3 Irina Demchenko
    36:       3 Andrey Lakhaev
    37:       2 Vadim
    38:       2 Kozyreva
    39:       2 DESKTOP-0N4O88L\dev
    40:       2 Alexander Zadoroznyi

real    0m1.334s
user    0m10.418s
sys     0m4.043s

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.