I'm trying to iteratively remove 2nd, 3rd... nth authors from citations in a document, where those citations are in the form (after some cleanup steps)
Straat, Ark, Sijtsma 2013, 75-99.
Nardulli, Peyton, Bajjalieh 2013, 139-192.
My strategy is, taking citations of the form AUTHOR1... AUTHORn-1 AUTHORn YEAR:
1) match AUTHORn-1 AUTHORn YEAR,
2) using a group replace, replace the matched substring with AUTHORn-1 YEAR, so that the overall citation becomes AUTHOR1... AUTHORn-1 YEAR.
3) Then loop around and do it again until all that remains is AUTHOR1 YEAR. I've got ten iterations in here because I know there are no multi-author citations with more than ten people.
My code is as follows:
def multiAuthor(citestring):
longcite = r'([\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*),[\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*[ ,]?( \(?\d\d\d\d[a-z]?[\s.,)])'
for x in range(0, 10):
newstring = re.sub(longcite, '\g<1>\g<2>', citestring)
return(newstring)
this is called on a string of footnotes separated by newlines, and it works on the first iteration. For my two sample matches given above, it correctly returns:
Straat, Ark 2013, 75-99.
Nardulli, Peyton 2013, 139-192.
But that's it. It does not successfully carry out replacement on any loop beyond the first, and accordingly fails to strip away the second author.
I've been debugging with regex101, but am officially stumped. The first iteration of the expression: https://www.regex101.com/r/jM2fF4/3 --- then after running the replacement, the regex on the second loop also matches, and ought to replace again: https://regex101.com/r/fZ1pX7/4
So I think my regex is right. Am I just missing something dumb and obvious? (I'm pretty new to python-land, but I've double and triple-checked my loop syntax, and I think it's right.)
Using python 3.
If you want to see it in action for yourself, I've also put a minimal runnable example (with spaces instead of newlines, but no diff) here: https://github.com/paultopia/stray-cites/blob/master/minimal-test.py
Save me, StackObi Wan, you're my only hope...?
Edit: I indeed was missing something obvious, see my self-answer below; leaving this up because it's probably a common oopsie.