FlinkML 0.10.1 Multiple Linear Regression with Sparse Vectors for Training

Question

FlinkML 0.10.1 Multiple Linear Regression with Sparse Vectors for Training

270 views Asked by sxsnyc At 03 February 2016 at 19:28

All,

I'm trying to test out Flink ML 0.10.1 by doing a linear regression as described here:

https://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html

I'm using SparseVectors instead of DenseVector, but encountering this issue when trying to train the model:

java.lang.IllegalArgumentException: axpy only supports adding to a dense vector but got type class org.apache.flink.ml.math.SparseVector.
    at org.apache.flink.ml.math.BLAS$.axpy(BLAS.scala:60)
    at org.apache.flink.ml.optimization.GradientDescent$$anonfun$org$apache$flink$ml$optimization$GradientDescent$$SGDStep$2.apply(GradientDescent.scala:181)
    at org.apache.flink.ml.optimization.GradientDescent$$anonfun$org$apache$flink$ml$optimization$GradientDescent$$SGDStep$2.apply(GradientDescent.scala:177)
    at org.apache.flink.api.scala.DataSet$$anon$7.reduce(DataSet.scala:583)
    at org.apache.flink.runtime.operators.chaining.ChainedAllReduceDriver.collect(ChainedAllReduceDriver.java:93)
    at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:97)
    at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:489)
    at org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:144)
    at org.apache.flink.runtime.iterative.task.IterationIntermediateTask.run(IterationIntermediateTask.java:92)
    at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:354)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
    at java.lang.Thread.run(Thread.java:745)

Does FlinkML MLG not support SparseVectors?

Original Q&A

There are 2 answers

**Till Rohrmann** · Answer 1 · 2016-02-04T15:46:35+00:00

The problem is that the GradientDescent implementation expects the sum of gradient vectors to be dense. This is not a strong limitation because the result of summing a set of sparse vectors does not have to be sparse again. Furthermore, it is often more efficient to convert the first gradient vector into a dense vector and then adding the following sparse gradient vectors to it instead of adding 2 sparse vectors all the time.

I've opened a pull request to fix this issue. It should be merged in the next days.

**Chobeat** · Answer 2 · 2016-02-03T21:11:04+00:00

I checked the source and it looks like that. There's an explicit check for types there and the case where the left vector is sparse raise that error. The code is really ugly so probably it's just a temporary version and will be improved over time. You should point it out on the mailing list or open an issue on JIRA.

TechQA.

FlinkML 0.10.1 Multiple Linear Regression with Sparse Vectors for Training

There are 2 answers

Related Questions in VECTOR

Related Questions in SPARSE-MATRIX

Related Questions in LOGISTIC-REGRESSION

Related Questions in APACHE-FLINK

Related Questions in FLINKML

Popular Questions

Popular Tags

Trending Questions