I use audivers application to convert PDFs and Images to MusicXML.
It give me some result. An for example this element after OMR:
<credit-words font-family="serif" font-size="23" default-x="407" default-y="1489">
Polonaise in F major
</credit-words>
contain attribute default-x and default-y. Problem is that it is not in pixels. What unit it is and how I can convert it on pixels?
Identifying exactly where on the page a musical element occurs can be extremely difficult in musicxml. The layout.py module of my music21 python toolkit (shameless plug) can do it up to the measure level -- getting the note/credit level will not be too hard after that. The code is LGPL so you could use that to hack together a parser in another language.
See http://web.mit.edu/music21/doc/moduleReference/moduleLayout.html#music21.layout.divideByPages