3

When using sparse matrices, it's easy to find non-zero entries quickly (as these are the only elements stored). However, what's the best way to find the first ZERO entry? Using approaches like find(X==0,1) or find(~X,1) tend to be slow as these require the negation of the whole matrix. It doesn't feel like that should be necessary -- is there a better way?


For instance, naively iterating over the array seems to be slightly faster than using find(X==0,1):

% Create a sparse array [10% filled]
x = sparse(5000,1);
x(randsample(5000,500)) = 1;
nRuns = 1e5; 
% Find the first element, a million times
idx = zeros(nRuns,1);
tic
for n=1:nRuns
    idx(n) = find(x==0,1);
end
toc


%%
% Create a sparse array [10% filled]
x = sparse(5000,1);
x(randsample(5000,500)) = 1;
nRuns = 1e5; 
% Find the first element, a million times
idx = zeros(nRuns,1);
tic
for n=1:nRuns
    for kk = 1:numel(x)
        [ii,jj] = ind2sub(size(x), kk);
        if x(ii,jj)==0; idx(n) = ii + (jj-1)*n; break; end
    end
end
toc

But what is the best way to do this?

8
  • Sparse arrays store non-zero elements in order. Just look through those until you find a missing element. Commented Feb 18 at 0:12
  • i.e. the naive loop I have above (in the second part of the snippet) -- or do you mean something else? Is that really the fastest way? Commented Feb 18 at 0:46
  • 1
    find(X==0,1) compares the whole matrix to zero (maybe even producing a full matrix?), then looks for the first non-zero element. In the loop you don’t touch most of the matrix. And it being a sparse matrix, you likely have mostly zero elements (if not, don’t use a sparse matrix), so the loop should terminate really quickly. Note that idx(n) = ii + (jj-1)*n is the same as idx(n) = kk. And x(ii,jj)==0 is the same as x(kk)==0. So removing the call to ind2sub should simplify and hopefully speed up your code. Commented Feb 18 at 1:37
  • But I was thinking of looking through the data as stored: a sparse matrix stores indices to non-zero elements and their values. You should be able to iterate faster over just the indices of the non-zero elements. Except I don’t know how to get that data in MATLAB. In a MEX-file this would be simple: mathworks.com/help/matlab/apiref/mxgetir.html — indexing into a sparse array is more expensive than indexing into a full array. Commented Feb 18 at 1:42
  • 1
    How big is your actual array? Because 5000x1 is tiny and it's not worth using a sparse array for. You need much, much larger arrays to make the sparse array overhead worth while. Commented Feb 18 at 6:07

2 Answers 2

2

For positive arrays min is probably the fastest solution:

x = sparse(5000, 1);
x(randsample(5000, 500)) = 1; 
[~, idx] = min(x);

ismember can also be used:

[~, ind] = ismember(0, x);
Sign up to request clarification or add additional context in comments.

4 Comments

not sure what the overhead would be using min(abs(x)) to make this work with any real array
logical(x) is another way.
Indeed, I did some quick benchmarks and looks like abs does have sufficient overhead to makes the loop (without sub2ind as suggested by Cris in the comments) faster for a generic array, while this might be quicker for positive arrays
Thanks all. So far min is the fastest. I have tested on 1000*1000 sparse logical matrices but will actually end up using matrices up to 30k*30k.
0

Appended a 3rd way to catch zeros with while loops, reducing x10 time required.

x = sparse(5000,1);                     % Create sparse array [10% filled]
x(randsample(5000,500)) = 1;
nRuns = 1e5; 

idx = zeros(nRuns,1);                   % Find first element, a million times
tag1=tic;
for n=1:nRuns
    idx(n) = find(x==0,1);
end
t1=toc(tag1)


%%
% x = sparse(5000,1);                     % Create a sparse array [10% filled]
% x(randsample(5000,500)) = 1;
% nRuns = 1e5; 

idx = zeros(nRuns,1);                  % Find the first element, a million times
tag2=tic;
for n=1:nRuns
    for kk = 1:numel(x)
        [ii,jj] = ind2sub(size(x), kk);
        if x(ii,jj)==0; 
            idx(n) = ii + (jj-1)*n; 
            break; 
        end
    end
end
t2=toc(tag2)

%%

tag3=tic;
idx=[];
nx_max=prod(size(x));
nx=1;
while nx<nx_max
while ~x(nx) 
   nx=nx+1;
   idx=[idx nx];
   if nx>=nx_max
       break;
   end
end

nx=nx+1;
end

t3=toc(tag3)

resulting :

t1 =
   0.563633000000000
t2 =
   0.420241800000000
t3 =
   0.017343400000000

To compare delays there's no need to generate a different sparse matrix x for each case, on the contrary, using the same matrix x one makes sure that the comparison is correct because it's on exactly the same matrix.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.