Tagging references/citations in text

83 views Asked by At

I need to find a way to tag references to publications in text. We've been doing this via regex but it won't work these new patterns.

Some examples (language is german):

Herzog (August 2012), Einkommensteuerskriptum Band 1, S 8

Achatz/Bieber in Achatz/Kirchmayr, Körperschaftsteuergesetz (2011)

Heinrich in Quantschnigg/Renner/Schellmann/Stöger, Die Körperschaftsteuer (2013) § 7 Rz 32

Raab/Renner in Quantschnigg/Renner/Schellmann/Stöger/Vock, Die Körperschaftsteuer, 24. Lfg., § 8 Tz 292,293

Quantschnigg/Renner/Schellmann/Stöger/Vock (Hrsg), KStG23 (2013) § 13 Rz 67

So it mostly starts out with author names and the Title of the publication but then it becomes pretty diverse. It might not look as bad in the examples but I could give a bunch more that again look differently.

So I thought this might be a task for machine learning. However having very little experience in that field i find it hard to find the right technique.

I found POS tagging but that doesn't seem to be the way to go here. I also stumbled upton CRF but there is little material on it that would get a beginner like myself started.

I've done some classification and regression in sklearn but that's about it.

Could anyone point me in the right direction ?

0

There are 0 answers