ojAlgo MatrixR032 creation from string, normalization, and cosine similarity calculation

Question

ojAlgo MatrixR032 creation from string, normalization, and cosine similarity calculation

59 views Asked by 219CID At 13 December 2023 at 22:54

I have two comma delimited strings containing embeddings. Each index should be able to fit into a float and they are 128 elements long. I am following this linear algebra intro in the ojAlgo library. I'd like to convert the two strings to ojAlgo matrices, normalize them, and then compute their cosine similarity. I am testing with a single matrix first - I expect when I compute its cosine similarity it should be 1.0.

    PhysicalStore.Factory<Double, Primitive32Store> storeFactory = Primitive32Store.FACTORY;
    String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
    Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
            .mapToDouble(Double::parseDouble)
            .toArray());
    System.out.println("Primitive32Store : " + matrixR032);
    matrixR032.modifyAny(DataProcessors.STANDARD_SCORE);
    System.out.println("Primitive32Store - normalized : " + matrixR032);
    System.out.println(matrixR032);
    System.out.println("matrixR032 " + matrixR032.multiply(storeFactory.make(matrixR032.transpose())));

 [java] Primitive32Store : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
 [java] { { 0.47058823704719543,    0.5647059082984924, 0.5490196347236633, 0.545098066329956,  0.5490196347236633 } }
 [java] Primitive32Store - normalized : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
 [java] { { NaN,    NaN,    NaN,    NaN,    NaN } }
 [java] org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
 [java] { { NaN,    NaN,    NaN,    NaN,    NaN } }
 [java] matrixR032 org.ojalgo.matrix.store.Primitive32Store < 1 x 1 >
 [java] { { NaN } }
 [java]

however my normalization results in NaN AND the input numbers are given additional digits I did not specify?

How can I ensure that when I convert from string->double[]->Primitive32Store additional digits are not added?
How can I normalize my vector and compute its cosine similarity?

update: when I switch to MatrixR064 the number no longer has seemingly random digits added to the end

Original Q&A

There are 1 answers

**apete** · Accepted Answer · 2023-12-14T08:32:42+00:00

Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
            .mapToDouble(Double::parseDouble)
            .toArray());

Is a somewhat messy way to this - you don't really see what's going on. How about this way:

    String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
    String[] values = dummyMatrixValues.split(",");

    PhysicalStore.Factory<Double, Primitive32Store> factory = Primitive32Store.FACTORY;

    Primitive32Store vector = factory.make(values.length, 1);

    for (int i = 0; i < values.length; i++) {
        vector.set(i, 0, Double.parseDouble(values[i]));
    }

    vector.modifyAny(DataProcessors.STANDARD_SCORE);

    double norm = vector.norm();
    double dotp = vector.dot(vector);
    double similarity = dotp / (norm * norm);

    System.out.println("norm: " + norm);
    System.out.println("dotp: " + dotp);
    System.out.println("similarity: " + similarity);

I assume the "additional digits" are representation errors. The 32 in the class name Primitive32Store indicated that it uses 32-bit float.

The DataProcessors class assume data is stored in columns – in your case 1 columns 5 rows. You did the opposite (transposed).

TechQA.

ojAlgo MatrixR032 creation from string, normalization, and cosine similarity calculation

There are 1 answers

Related Questions in JAVA

Related Questions in OJALGO

Popular Questions

Popular Tags

Trending Questions