I'm trying to remove line breaks with Python from wikitext templates of the form:
{{cite web
|title=Testing
|url=Testing
|editor=Testing
}}
The following should be obtained with re.sub:
{{cite web|title=Testing|url=Testing|editor=Testing}}
I've been trying with Python regex for hours, yet haven't succeeded at it. For example I've tried:
while(re.search(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}')):
textmodif=re.sub(r'\{cite web(.*?)([\r\n]+)(.*?)\}\}', r'{cite web\1\3}}', textmodif,re.DOTALL)
But it doesn't work as expected (even without the while loop, it's not working for the first line break).
I found this similar question but it didnt help: Regex for MediaWiki wikitext templates . I'm quite new at Python so please don't be too hard on me :-)
Thank you in advance.
You need to switch on newline matching for
.
; it does not match a newline otherwise:You have multiple newlines spread throughout the text you want to match, so matching just one set of consecutive newlines is not enough.
From the
re.DOTALL
documentation:You could use one
re.sub()
call to remove all newlines within thecite
stanza in one go, without a loop:This uses a nested regular expression to remove all whitespace with at least one newline in it from the matched text.
Demo: