3

Let's say I have a byte sequence of some size n (which could be 1..4 elements in the "real" code), with n = 3 for the sake of this example:

char source[n] = { 'a', 'b', 'c' }

And I have a memory range of sufficient size to hold m copies of this sequence:

char * dest = new char[m*n]

(And yes, I know std::vector, and yes, it's generally to be preferred over new'ing your own memory, and no, it's not an option for the code I am currently working on -- and anyway the problem would still be the same.)

Now I want to initialize dest with those m copies of source. There are various ways to do m copies of a single value, but apparently none for doing m copies of a sequence of values. Sure, I could use a nested loop:

for ( unsigned i1 = 0; i1 < m; ++i1 )
{
    for ( unsigned i2 = 0; i2 < n; ++i2 )
    {
        dest[ i1 * n + i2 ] = source[ i1 ];
    }
}

But somehow this lacks all the finesse that usually tells me that I got the "right" solution for a problem.

Does C++ offer some more efficient way for this?

14
  • Any reason you don't use std::vector? Commented Jan 30, 2016 at 21:05
  • @CaptainObvlious: Yes, several. Not part of the question. std::vector is not an option in the larger scheme this problem is a part of. Commented Jan 30, 2016 at 21:08
  • @DevSolar Try using memcpy to avoid nested loop. Commented Jan 30, 2016 at 21:09
  • n would be 4 for char source[n] = "abc". Commented Jan 30, 2016 at 21:10
  • Write c++ that clearly expresses intent. Use standard containers and algorithms when possible (almost always). The optimiser will produce efficient code. Commented Jan 30, 2016 at 21:11

4 Answers 4

2

Would this give you a right feeling? (see it live here)

auto it = dest;
while ((it = std::copy(source, source + n, it)) 
       != dest + m * n);
Sign up to request clarification or add additional context in comments.

Comments

1

zero-initialising is the most efficient. Where m is large and access sparse particularly, the OS may even use soft page faults to do COW of the same virtual zero-ed memory page for example, lazy allocating the requested memory when it's actually used.

Now, if you XOR every store & load from the byte arrary, with the appropriate byte from source, you can change the meaning of the NULL bit pattern.

dest[ i] = c1 ^ source[ i % n];  // store update
c2 = dest[ j] ^ source[ j % n];  // load, if dest[ j] is 0 it was never updated

In modern Out Of Order CPUs operations are not expensive compared to memory cache misses.

What you do need for this technique is to allocate the byte arrary in an OS specific way that guarantees it is zero-ed eg) mmap under Linux

Comments

1

I would use std::copy or std::copy_n in a for loop:

for( int i = 0; i < m; ++i )
   std::copy_n(source, n, dest+n*i );

for( int i = 0; i < m; ++i )
   std::copy(source, source+n, dest+n*i );

Comments

0

If m is large, and your memcpy is faster than a byte-based loop, it may be worth while doing an 'expanding binary fill' (I don't know what it's called because I just made it up). The idea is this:

  • Prime the destination array by copying n bytes from source to dest
  • Copy n bytes from dest to dest+n
  • Copy 2*n bytes from dest to dest+2*n
  • Copy 4*n bytes from dest to dest+4*n
  • Copy 8*n bytes from dest to dest+8*n
  • ...

until you have filled at least half the destination array. Now you can fill dest with one more copy.

This might actually be slower than a naive version, depending on things like caching. So if you do try this, you will have to run some performance tests.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.