Extract addresses and names from a listing in an HTML page

138 views Asked by At

Okay, so I have printed a pages HTML to string, and I am wanting to grab a certain String from it. The thing is it is different every time I load the page. Example:

Blah
blah
blah

1. name
   address
   phone number

2. name
   address
   phone number

blah
blah

There could be anything from 1 to 10 listings.
All I'm interested in is grabbing address and name.

I did try:

 public static String removeNonDigits(final String str) {
      if (str == null || str.length() == 0) {
           return "";
      }
       return str.replaceAll("\\D+", "");
 }

But with no Avail.

1

There are 1 answers

0
maraca On

Might have to adjust a little I don't know where the whitespaces are exactly:

Pattern pattern = Pattern.compile("\\n *(?:[1-9]|10)\\. +(.+?) *\\n *(.+?) *\\n");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
  System.out.println("name: " + matcher.group(1));
  System.out.println("address: " + matcher.group(2));
  System.out.println(matcher.group()); // the whole match
}