Can I have a faster nested loop just lowering the algorithm complexity?

Question

I have a newbie question guys.

Let's suppose that I have the following simple nested loop, where m and n are not necessarily the same, but are really big numbers:

x = 0;

for (i=0; i<m; i++)
{
   for (j=0; j<n; j++)
   {
      delta = CalculateDelta(i,j);
      x = x + j + i + delta;
   }
}

And now I have this:

x = 0;

for (i=0; i<m; i++)
{
   for (j=0; j<n; j++)
   {
      delta = CalculateDelta(i,j);
      x = x + j + i + delta;

      j++;

      delta = CalculateDelta(i,j);
      x = x + j + i + delta;
   }
}

Rule: I do need to go through all the elements of the loop, because of this delta calculation.

My questions are:

1) Is the second algorithm faster than the first one, or is it the same? I have this doubt because for me the first algorithm has a complexity of O(m * n) and the second one is O(m * n/2). Or does lower complexity not necessary makes it faster?

2) Is there any other way to make this faster without something like a Parallel. For?

3) If I make usage of a Parallel. For, would it really make it faster since I would probably need to do a synchronization lock on the x variable?

Thanks!

It can be parallelized, for sure. x can be computed as a function of i and j and the sum of all deltas. The sum can be performed in parallel with a typical "collect" pattern. (Assuming CalculateDelta is a pure function -- depends only on its arguments and has no side effects.) And, no, unrolling likely does barely anything for you. In fact, the way you've done it just causes a bug when n is odd. — Wyck
– Wyck, Commented May 16, 2020 at 16:55
And you didn't reduce the complexity. It's still proportional to n. Even if you unroll a million iterations inside the loop. The number of times it visits the loop is still proportional to n, so the complexity hasn't changed. — Wyck
– Wyck, Commented May 16, 2020 at 16:56
The second one is not the same as the first as it might calculate a delta for a j value equal to n. Basically if n is an odd number like 3 the first loop does j=0 and j=1 then the next loop does j=2 and j=3 where as the first one would stop at j=2 you can "fix" that with a if(j>=n) break; after the j++ but this still does n operations either way. — juharr
– juharr, Commented May 16, 2020 at 17:13

AlanK · Accepted Answer · 2020-05-27 15:53:04Z

4

Definitely not, since time complexity is presumably dominated by the number of times CalculateDelta() is called, it doesn't matter whether you make the calls inline, within a single loop or any number of nested loops, the call gets made m*n times.

And now you have a bug (which is the reason I decided to add an answer after @Peter-Duniho had already done so quite comprehensively)

If n is odd, you do more iterations than intended - almost certainly getting the wrong answer or crashing your program...

edited May 27, 2020 at 15:53

answered May 16, 2020 at 17:29

AlanK

2,00215 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Peter Duniho · Accepted Answer · 2020-05-16 17:03:30Z

Asking three questions in a single post is pushing the limits of "too broad". But in your case, the questions are reasonably simple, so…

1) Does the second algorithm is faster then the first one, or is it the same? I have this doubt because for me the first algorithm have a complexity of O(m * n) and the second one is O(m * n/2). Or does lower complexity not necessary makes it faster?

Complexity ignores coefficients like 1/2. So there's no difference between O(m * n) and O(m * n/2). The latter should have been reduced to O(m * n), which is obviously the same as the former.

And the second isn't really O(m * n/2) anyway, because you didn't really remove work. You just partially unrolled your loop. These kinds of meaningless transformations are one of the reasons we ignore coefficients in big-O notation in the first place. It's too easy to fiddle with the coefficients without really changing the actual computational work.

2) Is there any other way to make this faster without something like a Parallel.For?

That's definitely too broad a question. "Any other way"? Probably. But you haven't provided enough context.

The only obvious potential improvement I can see in the code you posted is that you are computing j + i repeatedly, when you could instead just observe that that component of the whole computation increases by 1 with each iteration and so you could keep a separate incrementing variable instead of adding i and j each time. But a) it's far from clear making that change would speed anything up (whether it would, would depend a lot on specifics in the CPU's own optimization logic), and b) if it did so reliably, it's possible that a good optimizing JIT compiler would make that transformation to the code for you.

But beyond that, the CalculateDelta() method is a complete unknown here. It could be a simple on-liner that the compiler inlines for you, or it could be some enormous computation that dominates the whole loop.

There's no way for any of us to tell you if there is "any other way" to make the loop faster. For that matter, it's not even clear that the change you made makes the loop faster. Maybe it did, maybe it didn't.

3) If I make usage of a Parallel.For, would it really make it faster since I would probably need to do a syncronization lock on the x variable?

That at least depends on what CalculateDelta() is doing. If it's expensive enough, then the synchronization on x might not matter. The bigger issue is that each calculation of x depends on the previous one. It's impossible to parallelize the computation, because it's inherently a serial computation.

What you could do is compute all the deltas in parallel, since they don't depend on x (at least, they don't in the code you posted). The other element of the sum is constant (i + j for known m and n), so in the end it's just the sum of the deltas and that constant. Again, whether this is worth doing depends somewhat on how costly CalculateDelta() is. The less costly that method is, the less likely you're going to see much if any improvement by parallelizing execution of it.

Wyck · Accepted Answer · 2020-05-16 17:31:08Z

2

One advantageous transformation is to extract the arithmetic sum of the contributions of the i and j terms using the double arithmetic series formula. This saves quite a bit of work, reducing the complexity of that portion of the calculation to O(1) from O(m*n).

x = 0;

for (i=0; i<m; i++)
{
   for (j=0; j<n; j++)
   {
      delta = CalculateDelta(i,j);
      x = x + j + i + delta;
   }
}

can become

x = n * m * (n + m - 2) / 2;

for (i=0; i<m; i++)
{
   for (j=0; j<n; j++)
   {
      x += CalculateDelta(i,j);
   }
}

Optimizing further depends entirely on what CalculateDelta does, which you have not disclosed. If it has side effects, then that's a problem. But if it's a pure function (where its result is dependent only on the inputs i and j) then there's a good chance it can be computed directly as well.

edited May 16, 2020 at 17:31

answered May 16, 2020 at 17:20

Wyck

12.1k8 gold badges50 silver badges93 bronze badges

1 Comment

Wyck Over a year ago

This assumes that CalculateDelta has no direct access to x, which was also not disclosed in the question.

naji mohamad · Accepted Answer · 2020-05-16 17:00:30Z

0

the first for() will send you to the second for() the second for() will loop till jn) and the second mn/2

answered May 16, 2020 at 17:00

naji mohamad

11 bronze badge

2 Comments

Rob Over a year ago

@Zartch This is the Answer section. He should not be asking questions here at all but I don't think he is.

Zartch Over a year ago

@Rob my bad o_o. I thought I was reviewing a question

Collectives™ on Stack Overflow

Can I have a faster nested loop just lowering the algorithm complexity?

4 Answers 4

Comments

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related