Why might this application of the turbo macro fail and fall back to inbounds and simd? We're trying to do mul!(y, A, x[:, col])
, but without copying the column of x.
function matmul_column!(y, A, x, col)
@turbo for i in axes(A, 1)
yi = 0.
for j in axes(A, 2)
yi += A[i, j] * x[j, col]
end
y[i] = yi
end
end
What are the element types?
Note that @view(x[:,col])
should also let you avoid copying the column of x
.
Turns out the element types were SVectors
, no wonder it wasn't working. Reshaping things to be a 3d SArray
rather than a SMatrix{SVector}
fixed it.
The matrix x
is tiny enough that I don't want the overhead of the @view
, but that's something I should measure too.
Last updated: Nov 22 2024 at 04:41 UTC