How (if possible) to sort a BTreeMap by value in Rust?

7.1k views Asked by At

I am following a course on Software Security for which one of the assignments is to write some basic programs in Rust. For one of these assignments I need to analyze a text-file and generate several statistics. One of these is a generated list of the ten most used words in the text.

I have written this program that performs all tasks in the assignment except for the word frequency statistic mentioned above, the program compiles and executes the way I expect:

extern crate regex;

use std::error::Error;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
use std::io::BufReader;
use std::collections::BTreeMap;
use regex::Regex;

fn main() {
    // Create a path to the desired file
    let path = Path::new("text.txt");
    let display = path.display();

    let file = match File::open(&path) {
        Err(why) => panic!("couldn't open {}: {}", display,
                           why.description()),
        Ok(file) => file,
    };

    let mut wordcount = 0;
    let mut averagesize = 0;
    let mut wordsize = BTreeMap::new();
    let mut words = BTreeMap::new();

    for line in (BufReader::new(file)).lines() {
        let re = Regex::new(r"([A-Za-z]+[-_]*[A-Za-z]+)+").unwrap();
        for cap in re.captures_iter(&line.unwrap()) {
            let word = cap.at(1).unwrap_or("");
            let lower = word.to_lowercase();
            let s = lower.len();

            wordcount += 1;
            averagesize += s;

            *words.entry(lower).or_insert(0) += 1;
            *wordsize.entry(s).or_insert(0) += 1;
        }
    }

    averagesize = averagesize / wordcount;

    println!("This file contains {} words with an average of {} letters per word.", wordcount, averagesize);

    println!("\nThe number of times a word of a certain length was found.");

    for (size, count) in wordsize.iter() {
        println!("There are {} words of size {}.", count, size);
    }

    println!("\nThe ten most used words.");

    let mut popwords = BTreeMap::new();
    for (word, count) in words.iter() {
        if !popwords.contains_key(count) {
            popwords.insert(count, "");
        }

        let newstring = format!("{} {}", popwords.get(count), word);
        let mut e = popwords.get_mut(count);
    }

    let mut i = 0;
    for (count, words) in popwords.iter() {
        i += 1;
        if i > 10 {
            break;
        }
        println!("{} times: {}", count, words);
    }
}

I have a BTreeMap (that I chose with these instructions), words, that stores each word as key and its associated frequency in the text as value. This functionality works as I expect, but there I am stuck. I have been trying to find ways to sort the BTreemap by value or find another data structure in Rust that is natively sorted by value.

I am looking for the correct way to achieve this data structure (a list of words with their frequency, sorted by frequency) in Rust. Any pointers are greatly appreciated!

1

There are 1 answers

7
Lukas Kalbertodt On BEST ANSWER

If you only need to analyze a static dataset, the easiest way is to just convert your BTreeMap into a Vec<T> in the end and sort the latter (Playground):

use std::iter::FromIterator;

let mut v = Vec::from_iter(map);
v.sort_by(|&(_, a), &(_, b)| b.cmp(&a));

The vector contains the (key, value) pairs as tuple. To sort the vector, we have to use sort_by() or sort_by_key(). To sort the vector in decreasing order, I used b.cmp(&a) (as opposed to a.cmp(&b), which would be the natural order). But there are other possibilities to reverse the order of a sort.


However, if you really need some data structure such that you have a streaming calculation, it's getting more complicated. There are many possibilities in that case, but I guess using some kind of priority queue could work out.