We have a performance issue where an expression is first translated to a Boost.uBLAS vector and then evaluated. It makes a difference if the vector creation could be skipped and that the vector_expression is used directly. I couldn't find in the Boost.uBLAS documentation if this is allowed. In fact the examples in the documentation are with the container classes and not with expressions directly. It only mentions that Boost.uBLAS uses expression templates which in theory should make the case work. The norm_2 function accepts a vector_expression as argument which could be a second clue.
A simplified case is like this where the norm between rows of a matrix is calculated:
#include <boost/numeric/ublas/assignment.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/matrix_proxy.hpp>
#include <boost/numeric/ublas/vector.hpp>
int main()
{
namespace ublas = boost::numeric::ublas;
ublas::matrix<double> m{3, 4};
m <<= 0, 1, 2, 3,
3, 4, 5, 6,
6, 7, 8, 9;
double d = 0;
for (size_t n = 1; n != m.size1(); ++n)
{
const ublas::matrix_row<ublas::matrix<double>> row1{m, 0};
const ublas::matrix_row<ublas::matrix<double>> row2{m, n};
#if 1
const auto e = row1 - row2; // creates an expression
d += ublas::norm_2(e); // uses the expression
#else
const ublas::vector<double> v = row1 - row2; // creates a vector (performance issue)
d += ublas::norm_2(v);
#endif
}
return 0;
}
Does anyone know if this is allowed?
uBLAS provides expression templates (ET) with which temporary object creation is avoided. This is very beneifical if you have multiple elementwise operations in one expression. In such cases, ET has a positiv impact on the runtime performance - although I have to say that it depends on the vector and matrix sizes. Have a look at this paper.
It is allowed - otherwise the copy constructors would have been protected or private, see the class matrix_expression description.
True. We need to provide more examples. However, the ublas documentation shows that the binary vector operation creates an
expression_type
which holds a reference to the operation and operands. You can use such an instance.Yes, have a look at the generated assembler in compiler explorer. No memory is allocated inside the
for
loop. So your approach is valid.You can also write