One very very common operation in my program is taking a vector, and making with it a vector in the same direction, with magnitude inverse to the original vector's square (v -> vNorm/v^2). I'm using the Eigen library for handling vectors, since it's supposed to be quite fast.
So far I've computed this vector like so:
Vector2d directedInverseSquare(const Vector2d &diff) {
return diff / (diff.norm() * diff.squaredNorm());
}
Which has worked fine. But it's not very efficient, since it computes both norm() and squaredNorm() separately (to my knowledge).
So I've been thinking for a while about modifying the Eigen source code to make a new type of norm, which fixes this double computation.
Important note: trying to save squareNorm and computing std::sqrt(sNorm)*sNorm always comes up short when comparing to the above implementation.
I'm not close to good enough at C++ to understand in any capacity how the Eigen source code works, but after looking, I've found what I'm fairly confident is the place to implement my norm.
In the Dot.h file of Eigen/src/Core, I found this definition:
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::normalized() const
{
typedef typename internal::nested_eval<Derived,2>::type _Nested;
_Nested n(derived());
RealScalar z = n.squaredNorm();
// NOTE: after extensive benchmarking, this conditional does not impact performance, at least on recent x86 CPU
if(z>RealScalar(0))
return n / numext::sqrt(z);
else
return n;
}
I copied it, and modified the line return n / numext::sqrt(z) to n / (z*numext::sqrt(z)). I also changed the name to newtonianNormalized (owing to its origins) and added a declaration in an earlier part of the file. I did the same with the norm's definition:
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::norm() const
{
return numext::sqrt(squaredNorm());
}
For which I added the line RealScalar squared = squaredNorm();, and changed the original line to return numext::sqrt(squared)*squared;.
In both instances the change to the code for a newtonian norm only requires one extra multiplication, which is a light cost. No other new computations are preformed.
After benchmarking, the results are very strange. My benchmarking consists of running large loops of the same operation over and over again, and summing up all the results to deny the compiler from contracting the loop into nothing. I use v.setRandom() to get random vectors, and I ran the loops and measured over ~20 runs of the program.
The results, presented with very basic bar charts:

The above chart presents different methods for calculating the "normed" vectors. The result is in milliseconds, using a 100000000 time loop.
The above chart presents different methods for calculating the "norm". The result is in milliseconds, also using a 100000000 time loop.
Note that the charts start nowhere close to 0, and so it's hard to see at first glance that the results barely differ at all.
I really expected a major performance boost: I cut the amount of calculations basically in half, and reuse the calculated quantities instead. How could it possibly be basically the same?
Edit: I compile with MSYS2 Mingw64 with GCC version 13.2.0 and Clion+CMake, with the following line:
C:\msys64\mingw64\bin\c++.exe -IC:/Users/.../MINREP/include/eigen-3.4.0 -isystem C:/Users/.../MINREP/include/SFML-2.6.1/include -O3 -DNDEBUG -march=native -std=gnu++23 -fdiagnostics-color=always -MD -MT CMakeFiles/MINREP.dir/main.cpp.obj -MF CMakeFiles\MINREP.dir\main.cpp.obj.d -o CMakeFiles/MINREP.dir/main.cpp.obj -c C:/Users/.../MINREP/main.cpp
I save the time before and after each loop with the chrono.h library. Please tell me if any other info is needed.
Edit: Here are both functions after modification:
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE typename NumTraits<typename internal::traits<Derived>::Scalar>::Real MatrixBase<Derived>::newtonianNorm() const
{
RealScalar squared = squaredNorm();
return numext::sqrt(squared)*squared;
}
template<typename Derived>
EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE const typename MatrixBase<Derived>::PlainObject
MatrixBase<Derived>::newtonianNormalized() const
{
typedef typename internal::nested_eval<Derived,2>::type _Nested;
_Nested n(derived());
RealScalar z = n.squaredNorm();
// NOTE: after extensive benchmarking, this conditional does not impact performance, at least on recent x86 CPU
if(z>RealScalar(0))
return n / (z*numext::sqrt(z));
else
return n;
}
