So I am building a huffman tree and I am in need of taking a String as input, and then create 2 arrays containing each letter and the number of occurrences of that letter in the original string, like this:
String s = "mississippi"
Should result in:
char[] charArr = {'m','i', 's', 'p'};
int[] count = {1,4,4,2};
There are alot of question regarding this and alot of examples on how to solve this, especially here on stackoverflow but the only one that I managed to get working was this:
private void findOccurences(String s) {
List<Character> original = new ArrayList<Character>(s.length());
List<Character> duplicateRemoved;
for (int i = 0; i < s.length(); i++) {
original.add(s.charAt(i));
}
duplicateRemoved = new ArrayList<Character>(original);
// Remove duplicates from second list.
Set<Character> hs = new HashSet<Character>();
hs.addAll(duplicateRemoved);
duplicateRemoved.clear();
duplicateRemoved.addAll(hs);
charFreqs = new int[duplicateRemoved.size()];
charArr = new char[duplicateRemoved.size()];
for (int i = 0; i < charArr.length; i++) {
char c = duplicateRemoved.get(i);
int count = Collections.frequency(original, c);
charArr[i] = c;
charFreqs[i] = count;
}
}
But it feels very cumberstone and it also scrambles the order of the letters in the array. If I use this my resulting array is as follows:
char[] charArr = {'p','s', 'i', 'm'};
Are there any better way to do what I want?
Using Java 8 :
chars()
method of String returns us surrogate code point of each character in StringUsing mapToObject map those code point to character
c -> (char) c
using Collectors.groupingBy and Collectors.counting() we will calculate occurrences of each character in string
After counting occurrences we want to remove duplicate characters.collectingAndThen help us to do some computation on the final result, It accepts first parameter as Collector so we will provide groupingBy and second parameter as a Function which accepts result generated by groupingBy and returns new result
Function<Map<Character, Long>, Map<Character, Long>> This function will drop characters which are duplicate and only allow character which has occurred only once
filter(e -> e.getValue() == 1)
*If you don't want to preserve order of character then you can remove
LinkedHashMap
, So it will be liketoMap(Map.Entry::getKey, Map.Entry::getValue)
andgroupingBy(c -> c, counting())
Demo