How to sparse web results elements that doesn't have their own exclusive containers using clojure enlive?

81 views Asked by At

I am trying the parse an event list using Enlive.

Normally, each event data is isolated in a specific div (here "result")

<div class="result">
  <h3>Event 1 title</h3>
  <a href="http://the_site.com/event1">Event 1 page</a>
  <p>Event 1 location</p>
</div>
<div class="result">
  <h3>Event 2 title</h3>
  <a href="http://the_site.com/event2">Event 2 page</a>
  <p>Event 2 location</p>
</div>

So I created a variable that has all parsing logic for each event site:

(def parsing-config 
  {:source "The Site"
   :results-url ["http://the_site.com"]
   :parsing {
     :title {:selector [[div.result] [:h3]]
             :trim-fn (comp first :content)}
     :url {:selector [[div.result] [:a]]
           :trim-fn (:href (:attrs %))}
     :location {:selector [[div.result] [:p]]
                :trim-fn (comp first :content)}}
    {:source "Other event site"
     ...}})

But for a specific site, I have divs that contain more than one event, like this:

<div class="September">
  <h3>Event 1 title</h3>
  <a href="http://other_site.com/event1">Event 1 page</a>
  <p>Event 1 location</p>
  <h3>Event 2 title</h3>
  <a href="http://other_site.com/event2">Event 2 page</a>
  <p>Event 2 location</p>
</div>
<div class="October">
  <h3>Event 3 title</h3>
  <a href="http://other_site.com/event3">Event 3 page</a>
  <p>Event 3 location</p>
  <h3>Event 4 title</h3>
  <a href="http://other_site.com/event4">Event 4 page</a>
  <p>Event 4 location</p>
</div>

How can I parse each event for this last site, while only changing the parsing-config variable and not the function that I use to parse (not shown here...)?

Thanks.

Note: The :trim-fn functions may not be accurate.

0

There are 0 answers