trying to understand LSH through the sample python code

591 views Asked by At

the concise python code i study for is here

Question A @ line 8

i do not really understand the syntax meaning for "res = res << 1" for the purpose of "get_signature"

Question B @ line 49 (SOLVED BY myself through another Q&A)

"xor = r1^r2" does not really make any sense to me, which the author later tried "(d-nna(vor))" to calculate "hash_sim" -- (refer to line 50)

Question C @ about hash_sim in general

that question is more to do with LSH understanding, what variable "d" (line 38) is doing in the sample code ---- which is later used to calculate hash_sim in line 50

Question D @ line 20 and 24 -- synatx for "&"

not only having problem in understand the syntax "num = num & (num-1)", but also unsure what function "nnz" is doing in the context of hash_similarlity. this question may relate to my question (-b-) when the author apply the "xor" into "nnz", and again equation for "xor" (question b) looks odd to me.

p.s.

both my python and LSH understanding are at the beginner level, and I kind of entering in the loop for this problem. thanks for taking your time to going through my confusion as well as the codes

1

There are 1 answers

1
Klaus D. On BEST ANSWER

a. It's a left shift: https://docs.python.org/2/reference/expressions.html#shifting-operations It shifts the bits one to the left.

b. Note that ^ is not the "to the power of" but "bitwise XOR" in Python.

c. As the comment states: it defines "number of bits per signature" as 2**101024

d. The lines calculate the bitwise AND of num and num + 1. The purpose of the function is documented in the comment above: "# get number of '1's in binary"