Find all HTML sibling element in iOS

194 views Asked by At

I have a huge HTML, but at a certain level there are 10 piece of article element. I need theme.

<article class="box-product-big box-product-full clearfix" >
    <div class="list-left">

        <div class="cover">
            <a id="book_cover_3100529" href="/film/fritz_lang.m-egy-varos-keresi-a-gyilkost-dvd.html">
                                                            <img src="http://s06.static.libri.hu/cover/d4/3/1090228_3.jpg" alt="Fritz Lang - M- Egy város keresi a gyilkost - DVD"/>
                                                </a>
                                </div>
        <div class="desc">
            <a class="book-title" href="/film/fritz_lang.m-egy-varos-keresi-a-gyilkost-dvd.html">

..

</article>

Here is the relating DOM:

enter image description here

With the following pattern I try to get them, but zero piece returned:

var error: NSError?
let pattern = "<article class=\"box-product-big box-product-full clearfix\">[\\S\\s]*?</article>"
var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.CaseInsensitive, error: &error)!
if error != nil {
    println(error)
}
let a = regex.matchesInString(str, options: NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, count(str)))

Any idea what is wrong?

Data comes from here: http://www.libri.hu/talalati_lista/?text=m


I tried with different escaping, but get an error:

enter image description here

String literals can include the following special characters: The escaped special characters \0 (null character), \ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quote) and \' (single quote)

doc

1

There are 1 answers

3
Federico Piazza On BEST ANSWER

You are using a forward / which is a special character, so you have to escape it with a backslash using \/:

let pattern = "<article class=\"box-product-big box-product-full clearfix\">[\\S\\s]*?<\/article>"
                                                  Escape slash with backslash ---------^

Quoting the documentation:

Regular Expression Metacharacters

Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /

enter image description here

Btw, you can shorten your regex like this:

<article[\S\s]*?<\/article>

Code

var error: NSError?
let pattern = "<article[\\S\\s]*?<\/article>"
var regex = NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions.CaseInsensitive, error: &error)!
if error != nil {
    println(error)
}
let a = regex.matchesInString(str, options: NSMatchingOptions.ReportCompletion, range: NSMakeRange(0, count(str)))

Also, you can use capturing groups to capture the content:

(<article[\S\s]*?<\/article>)