Read all possible sequential substrings in Python

Question

If I have a list of letters, such as:
word = ['W','I','N','E']
and need to get every possible sequence of substrings, of length 3 or less, e.g.:
W I N E, WI N E, WI NE, W IN E, WIN E etc.
What is the most efficient way to go about this?

Right now, I have:

word = ['W','I','N','E']
for idx,phon in enumerate(word):
    phon_seq = ""
    for p_len in range(3):
        if idx-p_len >= 0:
            phon_seq = " ".join(word[idx-(p_len):idx+1])
            print(phon_seq)

This just gives me the below, rather than the sub-sequences:

W
I
W I
N
I N
W I N
E
N E
I N E

I just can't figure out how to create every possible sequence.

Isn't what you are looking for just "WINE" with every possible placement of spaces inside it? — Stuart
– Stuart, Commented Nov 6, 2014 at 23:43
@Reut - the output would be long. There are some examples in the question — Adam_G
– Adam_G, Commented Nov 6, 2014 at 23:48
IIUC, you want something like this but with at least one split, so you don't get "WINE" as an output. Is that correct? — DSM
– DSM, Commented Nov 6, 2014 at 23:54

irrelephant · Accepted Answer · 2014-11-07 00:13:52Z

3

Try this recursive algorithm:

def segment(word):
  def sub(w):
    if len(w) == 0:
      yield []
    for i in xrange(1, min(4, len(w) + 1)):
      for s in sub(w[i:]):
        yield [''.join(w[:i])] + s
  return list(sub(word))

# And if you want a list of strings:
def str_segment(word):
  return [' '.join(w) for w in segment(word)]

Output:

>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]

>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']

edited Nov 7, 2014 at 0:13

answered Nov 6, 2014 at 23:48

irrelephant

4,1112 gold badges28 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

irrelephant Over a year ago

@Adam_G I improved the answer -- you should use this version instead!

irrelephant Over a year ago

The previous version made unnecessary recursive calls to segment instead of sub -- pretty much a typo.

Stuart · Accepted Answer · 2014-11-07 11:09:08Z

2

As there can either be a space or not in each of three positions (after W, after I and after N), you can think of this as similar to bits being 1 or 0 in a binary representation of a number ranging from 1 to 2^3 - 1.

input_word = "WINE"
for variation_number in xrange(1, 2 ** (len(input_word) - 1)):  
    output = ''
    for position, letter in enumerate(input_word):
        output += letter
        if variation_number >> position & 1:
            output += ' '
    print output

Edit: To include only variations with sequences of 3 characters or less (in the general case where input_word may be longer than 4 characters), we can exclude cases where the binary representation contains 3 zeroes in a row. (We also start the range from a higher number in order to exclude the cases which would have 000 at the beginning.)

for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)):  
    if not '000' in bin(variation_number):
        output = ''
        for position, letter in enumerate(input_word):
            output += letter
            if variation_number >> position & 1:
                output += ' '
        print output

edited Nov 7, 2014 at 11:09

answered Nov 7, 2014 at 0:13

Stuart

9,8771 gold badge24 silver badges34 bronze badges

6 Comments

irrelephant Over a year ago

Unfortunately, this algorithm doesn't generalize to longer input words as it prints substrings of length 4 and above. Try it with input_word = "SWINE" to see what I mean.

Stuart Over a year ago

@irrelephant it works for me with SWINE. Not sure what you mean about substring length.

irrelephant Over a year ago

When I try it, I get the line SWIN E -- the SWIN isn't length 3 or less. Part of the OP's problem restricts the substrings to a max length of 3.

Reut Sharabani Over a year ago

The comment to my post makes me think he didn't really mean the code should limit the sub-string's length. This is my favourite solution as it abstracts the problem nicely.

Stuart Over a year ago

@irrelephant ok I see what you mean. It generalises to list all substrings of input_word, regardless of length, as per my exchange with the OP in the comments.

|

Reut Sharabani · Accepted Answer · 2014-11-07 00:23:11Z

1

My implementation for this problem.

#!/usr/bin/env python

# this is a problem of fitting partitions in the word
# we'll use itertools to generate these partitions
import itertools

word = 'WINE'

# this loop generates all possible partitions COUNTS (up to word length)
for partitions_count in range(1, len(word)+1):
    # this loop generates all possible combinations based on count
    for partitions in itertools.combinations(range(1, len(word)), r=partitions_count):

        # because of the way python splits words, we only care about the
        # difference *between* partitions, and not their distance from the
        # word's beginning
        diffs = list(partitions)
        for i in xrange(len(partitions)-1):
            diffs[i+1] -= partitions[i]

        # first, the whole word is up for taking by partitions
        splits = [word]

        # partition the word's remainder (what was not already "taken")
        # with each partition
        for p in diffs:
            remainder = splits.pop()
            splits.append(remainder[:p])
            splits.append(remainder[p:])

        # print the result
        print splits

edited Nov 7, 2014 at 0:23

answered Nov 6, 2014 at 23:33

Reut Sharabani

31.5k7 gold badges76 silver badges95 bronze badges

2 Comments

Adam_G Over a year ago

I'm not sure this is correct. This gives permutations, but I'm looking for all substrings. In other words, every possible way to chop up the word.

Stuart Over a year ago

This does give all the substrings and returns the same results as my answer (except as a list instead of a string).

Kasravnd · Accepted Answer · 2014-11-07 00:32:14Z

As an alternative answer , you can do it with itertools module and use groupby function for grouping your list and also i use combination to create a list of pair index for grouping key : (i<=word.index(x)<=j) and at last use set for get a unique list .

Also note that you can got a unique combination of pair index at first by this method that when you have pairs like (i1,j1) and (i2,j2) if i1==0 and j2==3 and j1==i2 like (0,2) and (2,3) it mean that those slices result are same you need to remove one of them.

All in one list comprehension :

subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]]
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])

Demo in details :

>>> cl=list(combinations(range(len(word)),2))
>>> cl
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]

>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl]
>>> new_l
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]]
>>> last=[[''.join(i) for i in j] for j in new_l]
>>> last
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']]
>>> set([' '.join(j) for j in last])
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
>>> for i in set([' '.join(j) for j in last]):
...  print i
... 
WIN E
W IN E
W INE
WI NE
WINE
>>>

This is not working - there should be 7 possible output combinations.

Wang Guanqingwang · Accepted Answer · 2017-09-09 17:03:33Z

0

i think it can be like this: word = "ABCDE" myList = []

for i in range(1, len(word)+1,1):
    myList.append(word[:i])

    for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1):
        myList.append(word[j:i])

print(myList)
print(sorted(set(myList), key=myList.index))
return myList

answered Sep 9, 2017 at 17:03

Wang Guanqingwang

1

1 Comment

stealththeninja Over a year ago

Formatting suggestion: I'm guessing word and myList are supposed to be in the code block. Also, could you explain what this answer gives that is not found in the other responses?

Collectives™ on Stack Overflow

Read all possible sequential substrings in Python

5 Answers 5

2 Comments

6 Comments

2 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

6 Comments

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related