IRI validation, unexpected fail with encoded <> symbols

196 views Asked by At

I'm working with abdera in my project and it fails during parsing IRI address from content which contains already encoded < and > symbols: &gt; &lt;
The exception is: "org.apache.abdera.i18n.text.InvalidCharacterException: Invalid Character 0x3c(<)"

I'm confused since as I know these symbols (&gt; &lt;) are allowed in IRI format.

Could you please advise

EDIT: I'm using getHref() method of class org.apache.abdera.model.Link and the link is something like: http://blabla.com?xxx&gt;yyy&lt;zzz

1

There are 1 answers

0
Paul Sweatte On

It's parsing the symbol as an XML entity, so there are two solutions:

  • URI Encode the IRI within getHref

    encode("http://blabla.com?xxx&gt;yyy&lt;zzz", "utf-8")
    
  • Use another method rather than getHref

References