ojAlgo MatrixR032 creation from string, normalization, and cosine similarity calculation

56 views Asked by At

I have two comma delimited strings containing embeddings. Each index should be able to fit into a float and they are 128 elements long. I am following this linear algebra intro in the ojAlgo library. I'd like to convert the two strings to ojAlgo matrices, normalize them, and then compute their cosine similarity. I am testing with a single matrix first - I expect when I compute its cosine similarity it should be 1.0.

    PhysicalStore.Factory<Double, Primitive32Store> storeFactory = Primitive32Store.FACTORY;
    String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
    Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
            .mapToDouble(Double::parseDouble)
            .toArray());
    System.out.println("Primitive32Store : " + matrixR032);
    matrixR032.modifyAny(DataProcessors.STANDARD_SCORE);
    System.out.println("Primitive32Store - normalized : " + matrixR032);
    System.out.println(matrixR032);
    System.out.println("matrixR032 " + matrixR032.multiply(storeFactory.make(matrixR032.transpose())));

 [java] Primitive32Store : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
 [java] { { 0.47058823704719543,    0.5647059082984924, 0.5490196347236633, 0.545098066329956,  0.5490196347236633 } }
 [java] Primitive32Store - normalized : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
 [java] { { NaN,    NaN,    NaN,    NaN,    NaN } }
 [java] org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
 [java] { { NaN,    NaN,    NaN,    NaN,    NaN } }
 [java] matrixR032 org.ojalgo.matrix.store.Primitive32Store < 1 x 1 >
 [java] { { NaN } }
 [java] 

however my normalization results in NaN AND the input numbers are given additional digits I did not specify?

  1. How can I ensure that when I convert from string->double[]->Primitive32Store additional digits are not added?
  2. How can I normalize my vector and compute its cosine similarity?

update: when I switch to MatrixR064 the number no longer has seemingly random digits added to the end

1

There are 1 answers

2
apete On BEST ANSWER
Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
            .mapToDouble(Double::parseDouble)
            .toArray());

Is a somewhat messy way to this - you don't really see what's going on. How about this way:

    String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
    String[] values = dummyMatrixValues.split(",");

    PhysicalStore.Factory<Double, Primitive32Store> factory = Primitive32Store.FACTORY;

    Primitive32Store vector = factory.make(values.length, 1);

    for (int i = 0; i < values.length; i++) {
        vector.set(i, 0, Double.parseDouble(values[i]));
    }

    vector.modifyAny(DataProcessors.STANDARD_SCORE);

    double norm = vector.norm();
    double dotp = vector.dot(vector);
    double similarity = dotp / (norm * norm);

    System.out.println("norm: " + norm);
    System.out.println("dotp: " + dotp);
    System.out.println("similarity: " + similarity);

I assume the "additional digits" are representation errors. The 32 in the class name Primitive32Store indicated that it uses 32-bit float.

The DataProcessors class assume data is stored in columns – in your case 1 columns 5 rows. You did the opposite (transposed).