I have two comma delimited strings containing embeddings. Each index should be able to fit into a float and they are 128 elements long. I am following this linear algebra intro in the ojAlgo library. I'd like to convert the two strings to ojAlgo matrices, normalize them, and then compute their cosine similarity. I am testing with a single matrix first - I expect when I compute its cosine similarity it should be 1.0.
PhysicalStore.Factory<Double, Primitive32Store> storeFactory = Primitive32Store.FACTORY;
String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
.mapToDouble(Double::parseDouble)
.toArray());
System.out.println("Primitive32Store : " + matrixR032);
matrixR032.modifyAny(DataProcessors.STANDARD_SCORE);
System.out.println("Primitive32Store - normalized : " + matrixR032);
System.out.println(matrixR032);
System.out.println("matrixR032 " + matrixR032.multiply(storeFactory.make(matrixR032.transpose())));
[java] Primitive32Store : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
[java] { { 0.47058823704719543, 0.5647059082984924, 0.5490196347236633, 0.545098066329956, 0.5490196347236633 } }
[java] Primitive32Store - normalized : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
[java] { { NaN, NaN, NaN, NaN, NaN } }
[java] org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
[java] { { NaN, NaN, NaN, NaN, NaN } }
[java] matrixR032 org.ojalgo.matrix.store.Primitive32Store < 1 x 1 >
[java] { { NaN } }
[java]
however my normalization results in NaN
AND the input numbers are given additional digits I did not specify?
- How can I ensure that when I convert from string->double[]->Primitive32Store additional digits are not added?
- How can I normalize my vector and compute its cosine similarity?
update: when I switch to MatrixR064
the number no longer has seemingly random digits added to the end
Is a somewhat messy way to this - you don't really see what's going on. How about this way:
I assume the "additional digits" are representation errors. The 32 in the class name
Primitive32Store
indicated that it uses 32-bit float.The
DataProcessors
class assume data is stored in columns – in your case 1 columns 5 rows. You did the opposite (transposed).