1

Doing some computation that results in strings that contain byte data (the strings serve as byte arrays). Now this data needs to be sent to another program that expects all this data to be concatenated. From what you can read here, the best way to concatenate appears to be dumping the data into a list and then doing ''.join(lst) but it appears to me that creating might incur a memory overhead .

Is there any way to enjoy the benefits of ''.join(lst) without creating a long list?

It is not hard to approximate how big the complete string is going to be. Is there a way to allocate that space and just pour the data inside? For instance with something like numpy? Then convert it into a huge string?

11
  • 1
    lst can be a generator expression rather than an actual list. Commented Jul 8, 2015 at 16:39
  • 1
    @martineau, but python will construct a list anyway. If you pass a generator python will first construct a list as it has to do two passes over the data Commented Jul 8, 2015 at 16:40
  • 1
    How is the data being sent to another program? Via a socket? If so, perhaps you could send the total size and then the pieces through the socket as they are generated. Commented Jul 8, 2015 at 16:47
  • 2
    What about using io.StringIO, "an in-memory stream for text I/O"? Use write() to append each string, then getvalue() to get the finished product. Disclaimer: I don't actually have a clue whether this is a good idea. docs.python.org/3/library/io.html#io.StringIO Commented Jul 8, 2015 at 16:48
  • 1
    @zehelvion: If the big string is going to be written to a file, couldn't you then just write the smaller strings to the file sequentially without joining them first? Commented Jul 9, 2015 at 17:06

1 Answer 1

-1

str.join() actually does not need a list to join, but any kind of iterable. Therefore you could work with generators, serving string after string:

def calculate_something():
    # do something
    data = b"Foobar"
    yield from data
    # do something else
    yield from other_function_returning_string_data()

final_results = ''.join(calculate_something())

The yield from syntax is new since Python 3.3, if you are using something below 3.3 for c in data: yield c should work as well.

Sign up to request clarification or add additional context in comments.

1 Comment

Doing it like this (yield from) is effectively breaking each of the strings up into individual characters and then yielding each character of each one separately.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.