I have two files. One file contains set of URLs which need to be matched against the second file which also contains set of URLs. Currently I have used a foreach loop to do the matching. Since it has more 95,000 URLs the performance has hit a slow.

I need a way to increase the performance of the application since it is slow. I would be happy to know any way to avoid this low performance?

Thanks.

2 Answers

0
Tim Biegeleisen On

A suitable data structure to use here would be a hashset, because it has constant lookup time. You may parse one set of URLs from the first file and insert them into a hashset. Then, parse the second file and check each URL if it exists in the first file.

Set<String> urls = new HashSet<>();

// parse file file and add URLs to hashset
try (BufferedReader br = Files.newBufferedReader(Paths.get("firstURLs.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        urls.add(line);
    }
}
catch (IOException e) {
    System.err.format("IOException: %s%n", e);
}

// parse second file
try (BufferedReader br = Files.newBufferedReader(Paths.get("secondURLs.txt"))) {
    String line;
    while ((line = br.readLine()) != null) {
        if (urls.contains(line)) {
            System.out.println("MATCH: " + line);
    }
}
catch (IOException e) {
    System.err.format("IOException: %s%n", e);
}

The advantage of this approach is that it should perform linearly with the size of both files.

1
Surabhi Mundra On

You can try Radix Tree for storing data of the second file and searching it. https://en.wikipedia.org/wiki/Trie