1

This question has been asked before, but the fast answers that I have seen also remove the trailing spaces, which I don't want.

"   a     bc    "

should become

" a bc "

I have

text = re.sub(' +', " ", text)

but am hoping for something faster. The suggestion that I have seen (and which won't work) is

' '.join(text.split())

Note that I will be doing this to lots of smaller texts so just checking for a trailing space won't be so great.

8
  • 1
    If you want to really optimize stuff like this, use C, not python. Try cython, that is pretty much Python syntax but fast as C. Commented Jun 13, 2013 at 15:13
  • 1
    You could try ''.join((text[0],' '.join(text[1:-1].split()),text[-1])) but that is probably not faster than the regex (you'd need to timeit), and it's definitely not easier to read. Commented Jun 13, 2013 at 15:14
  • Have you checked that this is really the thing slowing down your program? My (very uninformed) guess is that it is not. First profile, and then if performance really is an issue, then optimise (and the easiest way to do that might be to rewrite the critical bits in C). Commented Jun 13, 2013 at 15:16
  • Why do you want something faster? I doubt it's really affecting your program. Commented Jun 13, 2013 at 15:18
  • 1
    See stackoverflow.com/questions/1546226/…. The winner seems to be while ' ' in s: s=s.replace(' ', ' ') Commented Jun 13, 2013 at 15:19

3 Answers 3

3

FWIW, some timings

$  python -m timeit -s 's="   a     bc    "' 't=s[:]' "while '  ' in t: t=t.replace('  ', ' ')"
1000000 loops, best of 3: 1.05 usec per loop

$ python -m timeit -s 'import re;s="   a     bc    "'  "re.sub(' +', ' ', s)"
100000 loops, best of 3: 2.27 usec per loop

$ python -m timeit -s 's=" a bc "' "''.join((s[0],' '.join(s[1:-1].split()),s[-1]))"
1000000 loops, best of 3: 0.592 usec per loop

$ python -m timeit -s 'import re;s="   a     bc    "'  "re.sub(' {2,}', ' ', s)"
100000 loops, best of 3: 2.34 usec per loop

$ python -m timeit -s 's="   a     bc    "' '" "+" ".join(s.split())+" "'
1000000 loops, best of 3: 0.387 usec per loop
Sign up to request clarification or add additional context in comments.

12 Comments

re.sub(' {2,}', ... would be a fairer test. There's no point in matching a single space.
@Aya -- Good suggestion, for me, that does about 30% better for this simple test.
I also timed my suggestion ... It comes in between the other two on my desktop: python -m timeit -s 's=" a bc "' "s = ''.join((s[0],' '.join(s[1:-1].split()),s[-1]))"
@lcfseth It would depend on the length of the string, and the number of multi-space instances. For longer strings with many multi-space instances, the regex would out-perform the str.replace approach.
With this trivial string the while-approach beats the re even with s = "..."*10000
|
2

If you want to really optimize stuff like this, use C, not python.

Try cython, that is pretty much Python syntax but fast as C.

Here is some stuff you can time:

import array
buf=array.array('c')
input="   a     bc    "
space=False
for c in input:
  if not space or not c == ' ': buf.append(c)
  space = (c == ' ')
buf.tostring()

Also try using cStringIO:

import cStringIO
buf=cStringIO.StringIO()
input="   a     bc    "
space=False
for c in input:
  if not space or not c == ' ': buf.write(c)
  space = (c == ' ')
buf.getvalue()

But again, if you want to make such things really fast, don't do it in python. Use cython. The two approaches I gave here will likely be slower, just because they put much more work on the python interpreter. If you want these things to be fast, do as little as possible in python. The for c in input loop likely already kills all theoretical performance of above approaches.

Comments

0

Just a small rewrite of the suggestion up there, but just because something has a small fault doesn't mean you should assume it won't work.

You could easily do something like:

front_space = lambda x:x[0]==" "
trailing_space = lambda x:x[-1]==" "
" "*front_space(text)+' '.join(text.split())+" "*trailing_space(text)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.