cuDF for string comparison boosting

184 views Asked by Wang Hao At 28 September 2020 at 02:58

I am working on finding matches between 2 large csv files. I use this function to compute the similarity between 2 strings. If the given ratio is greater than a predefine threshold, then I will accept this as a match.

def similar(a, b): return SequenceMatcher(None, a, b).ratio()

Because I need to go through every single line of both file, the time complexity is O(n^2). I've considered using hash to reduce the time complexity to O(n), but that would limit my match to be an exact match without flexibility. However, the first approach would take me several days to execute on my local computer with CPU. Therefore, I am wondering whether there is a way to use cuDF to boost the operation with GPU.

Also, when I tried cuDF applymap function, it said that it does not support string dtype, so is there any other way that I can use cuDF to implement this? Thank you!

Original Q&A

TechQA.

cuDF for string comparison boosting

There are 0 answers

Related Questions in PYTHON

Related Questions in CUDF

Popular Questions

Popular Tags

Trending Questions