I need to find a way to tag references to publications in text. We've been doing this via regex but it won't work these new patterns.
Some examples (language is german):
Herzog (August 2012), Einkommensteuerskriptum Band 1, S 8
Achatz/Bieber in Achatz/Kirchmayr, Körperschaftsteuergesetz (2011)
Heinrich in Quantschnigg/Renner/Schellmann/Stöger, Die Körperschaftsteuer (2013) § 7 Rz 32
Raab/Renner in Quantschnigg/Renner/Schellmann/Stöger/Vock, Die Körperschaftsteuer, 24. Lfg., § 8 Tz 292,293
Quantschnigg/Renner/Schellmann/Stöger/Vock (Hrsg), KStG23 (2013) § 13 Rz 67
So it mostly starts out with author names and the Title of the publication but then it becomes pretty diverse. It might not look as bad in the examples but I could give a bunch more that again look differently.
So I thought this might be a task for machine learning. However having very little experience in that field i find it hard to find the right technique.
I found POS tagging but that doesn't seem to be the way to go here. I also stumbled upton CRF but there is little material on it that would get a beginner like myself started.
I've done some classification and regression in sklearn but that's about it.
Could anyone point me in the right direction ?