Deduplicating HashMap Values

4.1k views Asked by At

I'm wondering if anyone knows a good way to remove duplicate Values in a LinkedHashMap? I have a LinkedHashMap with pairs of String and List<String>. I'd like to remove duplicates across the ArrayList's. This is to improve some downstream processing.

The only thing I can think of is keeping a log of the processed Values as I iterate over HashMap and then through the ArrayList and check to see if I've encountered a Value previously. This approach seems like it would degrade in performance as the list grows. Is there a way to pre-process the HashMap to remove duplicates from the ArrayList values?

To illustrate...if I have String1>List1 (a, b, c) String2>List2 (c, d, e) I would want to remove "c" so there are no duplicates across the Lists within the HashMap.

6

There are 6 answers

0
AudioBubble On

I believe creating a second HashMap, that can be sorted by values (Alphabetically, numerically), then do a single sweep through the sorted list, to check to see if the current node, is equivalent to the next node, if it is, remove the next one, and keep the increment at the same, so it will remain at the same index of that sorted list.

Or, when you are adding values, you can check to see if it already contains this value.

0
smp7d On

I'm assuming you need unique elements (contained in your Lists) and not unique Lists.

If you need no association between the Map's key and elements in its associated List, just add all of the elements individually to a Set.

If you add all of the Lists to a Set, it will contain the unique List objects, not unique elements of the Lists, so you have to add the elements individually.

(you can, of course, use addAll to make this easier)

0
Louis Wasserman On

Using Guava:

Map<Value, Key> uniques = new LinkedHashMap<Value, Key>();
for (Map.Entry<Key, List<Value>> entry : mapWithDups.entrySet()) {
  for (Value v : entry.getValue()) {
    uniques.put(v, entry.getKey());
  }
}
ListMultimap<K, V> uniqueLists = Multimaps.invertFrom(Multimaps.forMap(uniques), 
  ArrayListMultimap.create());
Map<K, List<V>> uniqueListsMap = (Map) uniqueLists.asMap(); // only if necessary

which should preserve the ordering of the values, and keep them unique. If you can use a ListMultimap<K, V> for your result -- which you probably can -- then go for it, otherwise you can probably just cast uniqueLists.asMap() to a Map<K, List<V>> (with some abuse of generics, but with guaranteed type safety).

0
The Real Baumann On

So, to clarify... You essentially have K, [V1...Vn] and you want unique values for all V?

public void add( HashMap<String, List> map, HashMap<Objet, String> listObjects, String key, List values)
{
    List uniqueValues= new List();
    for( int i  = 0; i < values.size(); i++ ) 
    {
        if( !listObjects.containsKey( values.get(i) ) )
        {
            listObjects.put( values.get(i), key );
            uniqueValues.add( values.get(i) );
        }
    }
    map.put( key, uniqueValues);
} 

Essentially, we have another HashMap that stores the list values and we remove the non-unique ones when adding a list to the map. This also gives you the added benefit of knowing which list a value occurs in.

0
millimoose On

Given your clarification, you want something like this:

class KeyValue {
    public String key;
    public Object value;

    KeyValue(String key, Object value) {
        this.key = key;
        this.value = value;
    }

    public boolean equals(Object o) {
        // boilerplate omitted, only use the value field for comparison
    }

    public int hashCode() {
        return value.hashCode();
    }
}

public void deduplicate() {
    Map<String, List<Object>> items = new HashMap<String, List<Object>>();
    Set<KeyValue> kvs = new HashSet<KeyValue>();

    for (Map.Entry<String, List<Object>> entry : items.entrySet()) {
        String key = entry.getKey();
        List<Object> values = entry.getValue();
        for (Object value : values) {
            kvs.add(new KeyValue(key, value));
        }
        values.clear();
    }

    for (KeyValue kv : kvs) {
        items.get(kv.key).add(kv.value);
    }
}

Using a set will remove the duplicate values, and the KeyValue lets us preserve the original hash key while doing so. Add getters and setters or generics as needed. This will also modify the original map and the lists in it in place. I also think the performance for this should be O(n).

1
user949300 On

As other have noted, you could check the value as you add, but, if you have to do it after the fact:

static public void removeDups(Map<String, List<String>> in) {
        ArrayList<String> allValues = new ArrayList<String>();
        for (List<String> inValue : in.values())
           allValues.addAll(inValue);
        HashSet<String> uniqueSet = new HashSet<String>(allValues);

        for (String unique : uniqueSet)
            allValues.remove(unique);

        // anything left over was a duplicate
        HashSet<String> nonUniqueSet = new HashSet<String>(allValues);

        for (List<String> inValue : in.values())
           inValue.removeAll(nonUniqueSet);

     }


     public static void main(String[] args) {
        HashMap<String, List<String>> map = new HashMap<String, List<String>>();
        map.put("1", new ArrayList(Arrays.asList("a", "b", "c", "a")));
        map.put("2", new ArrayList(Arrays.asList("d", "e", "f")));
        map.put("3", new ArrayList(Arrays.asList("a", "e")));

        System.out.println("Before");
        System.out.println(map);

        removeDups(map);
        System.out.println("After");
        System.out.println(map);

     }

generates an output of

Before
{3=[a, e], 2=[d, e, f], 1=[a, b, c, a]}
After
{3=[], 2=[d, f], 1=[b, c]}