I was using this (?<=alt)[\w\s\,\/\(\)\.]*
to extract the first alt text. This is great but there are multiple alt texts that I would like to extract.
I am using regex inside visual web ripper
The code I am extracting from is
<DIV id=ctl00_ContentRightColumn_CustomFunctionalityFieldControl1_ctl00_ctl00_woodFeatures class="woodFeaturesPanel woodFeaturesPanelSingle" sizcache="23614" sizset="0"><H2>Features:</H2> <DIV sizcache="23614" sizset="0"> <UL sizcache="23614" sizset="0"> <LI sizcache="23386" sizset="0"><IMG alt="Information board at site" src="/PublishingImages/icon_infoboard.gif"> <LI sizcache="20558" sizset="0"><IMG alt="Parking nearby" src="/PublishingImages/icon_carparknear.gif"> <LI sizcache="23614" sizset="0"><IMG alt=Grassland src="/PublishingImages/icon_grassland.giF"> <LI sizcache="17694" sizset="0"><IMG alt="Is woodland creation site" src="/PublishingImages/icon_woodlandcreation.gif"> <LI sizcache="21680" sizset="0"><IMG alt="Mainly broadleaved woodland" src="/PublishingImages/icon_mainlybroadleaved.gif"> <LI sizcache="20704" sizset="0"><IMG alt="Mainly young woodland" src="/PublishingImages/icon_mainlyyoung.gif"> <LI> <LI></LI></UL></DIV></DIV>
Without the language this is difficult to say, but using memory patterns you can capture what you need:
Using
preg_match_all()
it gives the following results:The second memory pattern is for double quote enclosed strings; if empty, you should look at the first memory pattern instead.