Should I use Xpath or regexp for this?

105 views Asked by At

I'm no expert at languages or have any knowledge of it. I'm pulling data from a website that is half dynamic.

For example I need to have 2 columns for "Advising on a home purchase plan - Customer Type" and "Advising on a home purchase plan - Investment Type" which would list types of customers and investments (can be several of each) they can go into one cell but have some sort of divider such as ";".

Here is what the table appears like

How the table appears

Here is what the code appears like:

Advising on a home purchase plan

                <div id="a2Nb000000035ohEAA" class="collapse DisciplineDetails PassportDetails PermDesc">
                  <h3 class="PermissionsListHeader">Advising on a home purchase plan</h3>
                  <br>
                  <br>
                </div>

                <ul class="PermissionConditionsList">
                  <li class="PermissionsConditionsItem">
                    Customer Type 

                    <ul class="PermCondsLimitationsList">
                      <li style="list-style: none"><span id="j_id0:j_id1:j_id110:regActTable:0:j_id531:0:j_id533:0:j_id535:0:j_id538"></span></li>

                      <li class="PermCondsLimitationsItem Popover">Customer</li>
                    </ul>
                  </li>
                </ul>

                <ul class="PermissionConditionsList">
                  <li class="PermissionsConditionsItem">
                    Investment Type 

                    <ul class="PermCondsLimitationsList">
                      <li style="list-style: none"><span id="j_id0:j_id1:j_id110:regActTable:0:j_id531:1:j_id533:0:j_id535:0:j_id538"></span></li>

                      <li class="PermCondsLimitationsItem Popover">Home purchase plans</li>
                    </ul>
                  </li>
                </ul>
              </div>
2

There are 2 answers

6
LukStorms On BEST ANSWER

This xpath works if there are no other lists that have those classes but shouldn't be taking in account.

//ul[@class='PermCondsLimitationsList']/li[@class='PermCondsLimitationsItem Popover']/(text()|span/text()))[normalize-space(.)]

Tested here

To just get the titles:

//ul[@class='PermissionConditionsList']/li[@class='PermissionsConditionsItem']/text()[normalize-space(.)]

Combined:

//ul[@class='PermissionConditionsList']/li[@class='PermissionsConditionsItem']/(text()|ul[@class='PermCondsLimitationsList']/li[@class='PermCondsLimitationsItem Popover']/(text()|span/text()))[normalize-space(.)]

But to get both in a certain format, an XSLT would probably be more useful.

2
wizardzz On

If you have chrome, you can view the xpath of an element by right clicking on the desired area and going to -> Inspect. The relevant part of the source code will be highlighted. From there you can get the xpath by right clicking the highlight code and going to Copy -> Copy XPath.