Determine phone number prefix with Trie in Java

1.8k views Asked by At

I am trying to create a fast search function to determine the prefix of a phone number.

I am loading prefix data from database into memory as TreeMap, where key is prefix and value is an object containing information about this prefix(country etc).

This is how TreeMap is populated:

private static TreeMap<String, PrefixData> prefixMap = new TreeMap<>();
//EXAMPLE of DB query
    try {
        Class.forName("org.postgresql.Driver");
        Connection connection = DriverManager.getConnection(dbURL, dbUser, dbPass);
        connection.setAutoCommit(false);
        Statement stmt = connection.createStatement();

        ResultSet rs = stmt.executeQuery("SELECT * FROM countries_prefixes");

        //Looping resultset
        while (rs.next()) {
            //TODO Review fields that must be stored in memory
            String country = rs.getString("name");

            //Populating map with data object (keeping nr prefix as key and object as value)
            prefixMap.put(rs.getString("country_prefix"), new PrefixData(country));
        }

        rs.close();
        stmt.close();
        connection.close();

    } catch (ClassNotFoundException | SQLException e) {
        System.out.println(e.toString());
    }

Lets say I have phone numbers I want to check: 37251845632; 35844021546; 34651478966 etc ...

Some prefixes are 1 digit long, some 2 digits long, some 3 digits long and so on...

So I created a loop, that works:

//TODO Try some other search methods (Tries)
    //array for prefix length priority
    int[] sequence = {3, 4, 2, 1, 5, 6};

    //Performing search from the map
    for (int i = 0; i < sequence.length; i++) {

        //Extracting prefix from phone nr
        String prefix = phoneNr.substring(1, sequence[i] + 1);

        if (prefixMap.containsKey(prefix)) {
            PrefixData pdata = prefixMap.get(prefix);
            System.out.println(String.format("Found matching key [%s] in TreeMap", prefix));
            System.out.println(String.format("[NAME: %s] [REGEX: %s] ", pdata.getCountryName(), pdata.getNrRegex()));

            //Validate number format with regex
            if (pdata.getNrRegex().trim() != null && !pdata.getNrRegex().trim().isEmpty()) {
                System.out.println("Regex for number validation is present!");
                if (phoneNr.matches(pdata.getNrRegex().replaceAll("^/|/$", ""))) {
                    System.out.println("NUMBER IS VALID!");
                } else {
                    System.out.println("INVALID NUMBER!");
                }
            }

            return pdata;

        }
    }
    return null;
    }

Now the loop works well, but it is slow. I've heard something about Tries, which is faster, but I don't understand how to implement this in my scenario.

Any help is appreciated!

1

There are 1 answers

0
lkallas On BEST ANSWER

As I said, the loop works, but this is not a nice way to achieve my goal.

So I did a little bit of research and came up with solution that is using prefix tree (Trie) implementation.

Little reading what Trie is can be found here.

And now the Trie implementation part. I knew that it would be faster to find a code that is already written and tested, so I found Google implementation here. And Vladimir Kroz's implementation here.

Made some minor modifications and this is the solution. I will provide both solutions:

Prefixmap interface

package tiesImpl;

/**
 * Maps string prefixes to values. For example, if you {@code put("foo", 1)},
 * {@code get("foobar")} returns {@code 1}. Prohibits null values.
 *
 * <p>Use instead of iterating over a series of string prefixes calling
 * {@code String.startsWith(prefix)}.
 *
 * @author [email protected] (Bob Lee)
 * @param <T>
 */

public interface PrefixMap<T> {
  /**
   * Maps prefix to value.
   *
     * @param prefix
     * @param value
   * @return The previous value stored for this prefix, or null if none.
   * @throws IllegalArgumentException if prefix is an empty string.
   */
  T put(CharSequence prefix, T value);

  /**
   * Finds a prefix that matches {@code s} and returns the mapped value.
   *
   * If multiple prefixes in the map match {@code s}, the longest match wins.
   *
     * @param s
   * @return value for prefix matching {@code s} or {@code null} if none match.
   */
  T get(CharSequence s);
}

PrefixTrie class

package tiesImpl;

/*
 * Copyright (C) 2007 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.LinkedHashMap;
import java.util.Map;

/**
 * Trie implementation supporting CharSequences as prefixes.
 *
 * Prefixes are sequences of characters, and the set of allowed characters is
 * specified as a range of sequential characters. By default, any seven-bit
 * character may appear in a prefix, and so the trie is a 128-ary tree.
 *
 * @author [email protected] (Bob Lee)
 * @author [email protected] (Matthew Harris)
 * @param <T>
 */
public class PrefixTrie<T> implements PrefixMap<T> {
  // The set of allowed characters in prefixes is given by a range of
    // consecutive characters.  rangeOffset denotes the beginning of the range,
    // and rangeSize gives the number of characters in the range, which is used as
    // the number of children of each node.

private final char rangeOffset;
private final int rangeSize;

private final Node<T> root;

/**
 * Constructs a trie for holding strings of seven-bit characters.
 */
public PrefixTrie() {
    rangeOffset = '\0';
    rangeSize = 128;
    root = new Node<>(rangeSize);
}

/**
 * Constructs a trie for holding strings of characters.
 *
 * The set of characters allowed in prefixes is given by the range
 * [rangeOffset, lastCharInRange], inclusive.
 *
 * @param firstCharInRange
 * @param lastCharInRange
 */
public PrefixTrie(char firstCharInRange, char lastCharInRange) {
    this.rangeOffset = firstCharInRange;
    this.rangeSize = lastCharInRange - firstCharInRange + 1;

    if (rangeSize <= 0) {
        throw new IllegalArgumentException("Char range must include some chars");
    }

    root = new Node<>(rangeSize);
}

/**
 * {@inheritDoc}
 *
 * @param prefix
 * @param value
 * @throws IllegalArgumentException if prefix contains a character outside
 * the range of legal prefix characters.
 */
@Override
public T put(CharSequence prefix, T value) {
    if (value == null) {
        throw new NullPointerException();
    }

    Node<T> current = root;
    for (int i = 0; i < prefix.length(); i++) {
        int nodeIndex = prefix.charAt(i) - rangeOffset;
        try {
            Node<T> next = current.next[nodeIndex];
            if (next == null) {
                next = current.next[nodeIndex] = new Node<>(rangeSize);
            }
            current = next;
        } catch (ArrayIndexOutOfBoundsException e) {
            throw new IllegalArgumentException(
                    "'" + prefix.charAt(i) + "' is not a legal prefix character.");
        }
    }
    T oldValue = current.value;
    current.value = value;
    return oldValue;
}

/**
 * {@inheritDoc}
 * @param s
 */
@Override
public T get(CharSequence s) {
    Node<T> deepestWithValue = root;
    Node<T> current = root;
    for (int i = 0; i < s.length(); i++) {
        int nodeIndex = s.charAt(i) - rangeOffset;
        if (nodeIndex < 0 || rangeSize <= nodeIndex) {
            return null;
        }
        current = current.next[nodeIndex];
        if (current == null) {
            break;
        }
        if (current.value != null) {
            deepestWithValue = current;
        }
    }
    return deepestWithValue.value;
}

/**
 * Returns a Map containing the same data as this structure.
 *
 * This implementation constructs and populates an entirely new map rather
 * than providing a map view on the trie, so this is mostly useful for
 * debugging.
 *
 * @return A Map mapping each prefix to its corresponding value.
 */
public Map<String, T> toMap() {
    Map<String, T> map = newLinkedHashMap();
    addEntries(root, new StringBuilder(), map);
    return map;
}

/**
 * Adds to the given map all entries at or below the given node.
 *
 * @param node
 * @param builder A StringBuilder containing the prefix for the given node.
 * @param map
 */
private void addEntries(Node<T> node,
        StringBuilder builder,
        Map<String, T> map) {
    if (node.value != null) {
        map.put(builder.toString(), node.value);
    }

    for (int i = 0; i < node.next.length; i++) {
        Node<T> next = node.next[i];
        if (next != null) {
            builder.append((char) (i + rangeOffset));
            addEntries(next, builder, map);
            builder.deleteCharAt(builder.length() - 1);
        }
    }
}

private static class Node<T> {

    T value;
    final Node<T>[] next;

    @SuppressWarnings("unchecked")
    Node(int numChildren) {
        next = new Node[numChildren];
    }
}

/**
 * Creates a {@code LinkedHashMap} instance.
 *
 * @param <K>
 * @param <V>
 * @return a newly-created, initially-empty {@code LinkedHashMap}
 */
public static <K, V> LinkedHashMap<K, V> newLinkedHashMap() {
    return new LinkedHashMap<>();
}

}

Vladimir Kroz implementation: Trie class

package tiesImpl;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

/**
 * Prefix table based on Trie structure. Allows to perform incremental lookup
 * and match based on search key prefixes (classic example - determine phone
 * area code for given phone number)
 *
 * @param <V> a type of value object to be stored along with prefix (e.g when
 * key is a country name, the value could be a name of the country)
 *
 * @author Vladimir Kroz
 * https://vkroz.wordpress.com/2012/03/23/prefix-table-trie-implementation-in-java/
 */

public class Trie<V> {

Entry<V> entry;
char key;
Map<Character, Trie<V>> children;

public Trie() {
    this.children = new HashMap<>(10);
    entry = new Entry<>();
}

/**
 * non-public, used by _put()
 */
Trie(char key) {
    this.children = new HashMap<>(10);
    this.key = key;
    entry = new Entry<>();
}

public void put(String key, V value) {
    _put(new StringBuffer(key), new StringBuffer(""), value);
}

void _put(StringBuffer remainder, StringBuffer prefix, V value) {
    if (remainder.length() > 0) {
        char keyElement = remainder.charAt(0);
        Trie<V> t = null;
        try {
            t = children.get(keyElement);
        } catch (IndexOutOfBoundsException e) {
        }
        if (t == null) {
            t = new Trie<>(keyElement);
            children.put(keyElement, t);
        }
        prefix.append(remainder.charAt(0));
        t._put(remainder.deleteCharAt(0), prefix, value);
    } else {
        this.entry.value = value;
        this.entry.prefix = prefix.toString();
    }

}

/**
 * Retrieves element from prefix table matching as a prefix to provided
 * key. E.g. if key is "37251656565" and prefix table has node "372" then
 * this call will return the value of "372"
 *
 * @param key a string which starts with prefix to be searched in the table
 * (e.g. phone number)
 * @return an Object associated with matching prefix (i.e if key is a phone
 * number it may return a corresponding country name)
 */
public V get(String key) {
    return _get(new StringBuffer(key), 0);
}

/**
 * Returns true if key has matching prefix in the table
 *
 * @param key
 * @return
 */
public boolean hasPrefix(String key) {
    return this.get(key) != null;
}

V _get(StringBuffer key, int level) {
    if (key.length() > 0) {
        Trie<V> t = children.get(key.charAt(0));
        if (t != null) {
            //FYI: modified code to return deepest with value instead of returning null if prefix doesn't have corresponding value.
            V result = t._get(key.deleteCharAt(0), ++level);
            return result == null ? entry.value : result;

        } else {
            return (level > 0) ? entry.value : null;
        }

    } else {
        return entry.value;
    }
}

@Override
//For debugging
public String toString() {

    Iterator<Character> it = children.keySet().iterator();
    StringBuffer childs = new StringBuffer();

    while (it.hasNext()) {
        Character _key = it.next();
        childs.append(String.format("\n%s\n",
                //Adding a tab to the beginning of every line to create a visual tree
                String.format("%s: %s", _key, children.get(_key)).replaceAll("(?m)(^)", "\t")));
    }

    return String.format("Trie [entry=%s, children=%s]", entry, childs);
}

static public class Entry<V> {

    String prefix;
    V value;

    public Entry() {
    }

    public Entry(String p, V v) {
        prefix = p;
        value = v;
    }

    public String prefix() {
        return prefix;
    }

    public V value() {
        return value;
    }

    @Override
    public String toString() {
        return "Entry [prefix=" + prefix + ", value=" + value + "]";
    }

}
}

And finally the Testing part

package tiesImpl;

/**
 * Class for testing different implementations of prefix tree (Trie).
 *
 * @author lkallas
 */

public class TriesTest {

private static final PrefixTrie<String> googleTrie = new PrefixTrie<>();
private static final Trie<String> krozTrie = new Trie<>();

public static void main(String[] args) {

    //Inserting prefixes to Google implementation of Trie
    googleTrie.put("7", "Russia");
    googleTrie.put("77", "Abkhazia");
    googleTrie.put("746", "Some unknown country");

    //Inserting prefixes to Vladimir Kroz implementation of Trie
    krozTrie.put("7", "Russia");
    krozTrie.put("77", "Abkhazia");
    krozTrie.put("746", "Some unknown country");

    System.out.println("Google version of get: " + googleTrie.get("745878787"));
    System.out.println("Vladimir Kroz version of get: " + krozTrie.get("745878787"));

}

}

Hope that this answer is somewhat helpful to others also! Cheers!