I have a data set (D) of (nxd) where n=number of rows and d= number of dimensions, I create a similarity matrix (S)(nxn) by comparing each row of the data set (D) and then convert it into a sparse matrix (tx3) where t is the number of non-zero elements of the symmetric similarity matrix (S)
The time complexity of creating the similarity matrix is o(n^2d) where d is some constant operation. The time complexity of converting a sparse matrix is theta(n^2)
My question is: While creating the similarity matrix if I perform a check that "if the similarity value is "zero" then proceed (continue) else put it into the sparse matrix". Assuming this can I say that the cost of computing the sparse matrix from the dataset (D) is O(n^2 d).
For Example:
Creating Similarity Matrix:
for i in range(0,n):
for j in range(0,n):
find similarity_value of D[i] and D[j]
insert into similarity_matrix: S[i,j]= similarity_value
The above runs in O(n^2 d)
n^2 for the loops
d for finding the similarity between D[i] and D[j]
Sparse Matrix creation form Simiarity matrix
for i in range(0,n):
for j in range(0,n):
if S[i,j]==0:
continue
else
insert into sparse_matrix [i, j, S[i,j]]
The above runs in O(n^2)
n^2 for the loops
Performing both the operation would require O(n^2 d) +O(n^2) if done one after another.
Since we require only the sparse_matrix, we create the sparse matrix directly without creating the similarity matrix.
Creating Sparse matrix directly without creating the similarity matrix:
for i in range(0,n):
for j in range(0,n):
find similarity_val of D[i] and D[j]
if similarity_val==0:
continue
else
insert into sparse_matrix [i,j,similarity_val]
My question is:
Wouldn't the above run in only O(n^2 d), since I am directly inserting into sparse matrix
n^2 for the two loops
d for finding the similarity_val of D[i] and D[j]
Please let me know if I am missing something or my understanding of something is wrong.
For every (i, j) pair (there are n^2 in total), you reach the inner part of the loop where you find the similarity and then conditionally add elements to your sparse matrix. Finding the similarity takes "d" operations (because you need to loop through each of your dimensions) and conditionally adding an element takes a constant number of operations (either 1 comparison operation in the case where the value is 0 and 1 comparison operation plus one insertion operation in the case where the value is non-zero). Since you need to do "d" plus a constant number of operations each time you reach the inside of this double loop, in total you perform O(n^2 d) operations.
Note that this asymptotic operation count will not change if you limit the inner loop to values of j that are no less than i (aka replace
for j in range(0, n)
withfor j in range(i, n)
). This is because you will reach the inside of the loop n*(n+1)/2 times and perform "d" plus a constant number of operations, which is still O(n^2 d) total computation.