I want to extract multiple instances of alt text with regex but not sure how

335 views Asked by At

I was using this (?<=alt)[\w\s\,\/\(\)\.]* to extract the first alt text. This is great but there are multiple alt texts that I would like to extract. I am using regex inside visual web ripper

The code I am extracting from is

<DIV id=ctl00_ContentRightColumn_CustomFunctionalityFieldControl1_ctl00_ctl00_woodFeatures class="woodFeaturesPanel woodFeaturesPanelSingle" sizcache="23614" sizset="0"><H2>Features:</H2>  <DIV sizcache="23614" sizset="0">  <UL sizcache="23614" sizset="0">  <LI sizcache="23386" sizset="0"><IMG alt="Information board at site" src="/PublishingImages/icon_infoboard.gif">  <LI sizcache="20558" sizset="0"><IMG alt="Parking nearby" src="/PublishingImages/icon_carparknear.gif">  <LI sizcache="23614" sizset="0"><IMG alt=Grassland src="/PublishingImages/icon_grassland.giF">  <LI sizcache="17694" sizset="0"><IMG alt="Is woodland creation site" src="/PublishingImages/icon_woodlandcreation.gif">  <LI sizcache="21680" sizset="0"><IMG alt="Mainly broadleaved woodland" src="/PublishingImages/icon_mainlybroadleaved.gif">  <LI sizcache="20704" sizset="0"><IMG alt="Mainly young woodland" src="/PublishingImages/icon_mainlyyoung.gif">  <LI>  <LI></LI></UL></DIV></DIV>
1

There are 1 answers

0
Ja͢ck On BEST ANSWER

Without the language this is difficult to say, but using memory patterns you can capture what you need:

/alt=(\w\S*|"([^"]*)")/

Using preg_match_all() it gives the following results:

Array
(
    [0] => Array
        (
            [0] => alt="Information board at site"
            [1] => alt="Parking nearby"
            [2] => alt=Grassland
            [3] => alt="Is woodland creation site"
            [4] => alt="Mainly broadleaved woodland"
            [5] => alt="Mainly young woodland"
        )

    [1] => Array
        (
            [0] => "Information board at site"
            [1] => "Parking nearby"
            [2] => Grassland
            [3] => "Is woodland creation site"
            [4] => "Mainly broadleaved woodland"
            [5] => "Mainly young woodland"
        )

    [2] => Array
        (
            [0] => Information board at site
            [1] => Parking nearby
            [2] =>
            [3] => Is woodland creation site
            [4] => Mainly broadleaved woodland
            [5] => Mainly young woodland
        )

)

The second memory pattern is for double quote enclosed strings; if empty, you should look at the first memory pattern instead.