2

I have a list made by strings, correctly cleaned (split(',') can be safely used), and correctly sorted depending on numbers. As a small example:

l = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

What I'm trying to achieve is to create as many sublists that start and end with single strings, that is:

[
    ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
    ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
    ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]

I thought to add some logic like the following code, but I'm not sure if I'm on the correct way:

tl = []

for i in l:
    
    # just get the variable
    val = i
    
    tl.append(val)
    
    # split by ,
    val_split = len(i.split(','))  
    
    # check if the value is the first element of the list (C1)
    if val == l[0]:
        print(1, val)
    # check if the split of the character is longer than 2 (C1,C2)
    elif val_split > 1:
        print(2, val)
    # check is the split of the character siis equalt to 1 (C4)
    elif val_split == 1:
        # here the code should compare if the character is equal to the last value of the nested list. If yes go with teh next value (C5)
        if val != tl[-1]:
            print(3, val)
        else:
            print(4, val)
2
  • Depending on your source data, it might be impossible to achieve your objective - i.e., if there are never an adjacent pair of "single strings". Commented Feb 24 at 10:36
  • @Adon Bilivit data are cleaned, if not then the source list will be rejected. So the list will always start with a single string and adjacent pair will always be like in the example (C4, C5) Commented Feb 24 at 10:40

6 Answers 6

1

If the input list is guaranteed to start and end with a single string and if there will always be at least one adjacent pair of single strings then:

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
result = [[]]
for e in lst:
    result[-1].append(e)
    if not "," in e:
        if len(result[-1]) > 1:
            result.append([])
result.pop()
print(result)

Output:

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
Sign up to request clarification or add additional context in comments.

Comments

1

Here is my take on this, using regular expressions. We can recombine your starting list using some distinct separator, say |, then use re.findall to find each single C-multi C string.

import re

inp = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
x = '|'.join(inp)
parts = re.findall(r'(?<![^|])C\d+(?:\|(?:C\d+(?:,C\d+)+)+)+\|C\d+(?![^|])', x)
output = [p.split('|') for p in parts] 
print(output)

This prints:

[
    ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
    ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
    ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]

1 Comment

Is it not reasonable to assume that the OP implicitly defines a "single string" as being any string that does not contain a comma? If that's the case, then this rather esoteric approach isn't helpful.
1

Given data s like below

s = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

you can try itertools along with numpy

import numpy as np
import itertools
grp = np.ceil(np.cumsum(np.char.count(s, ',')==0)/2)
[list(g) for k, g in itertools.groupby(s, lambda i: grp[s.index(i)])]

or without numpy

from itertools import accumulate, groupby
from math import ceil

grp = [ceil(x/2) for x in accumulate(map(lambda x: int(x.count(',')==0), s))]
[list(g) for k, g in groupby(s, lambda i: grp[s.index(i)])]

such that you will obtain

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

1 Comment

Great use of numpy for a base Python use case.
1

With split_when from more-itertools:

from more_itertools import split_when

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

result = list(split_when(lst, lambda s, t: ',' not in s+t))

print(result)

Or just basic:

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

result = []
it = iter(lst)
for s in it:
    sub = [s]
    for t in it:
        sub.append(t)
        if ',' not in t:
            break
    result.append(sub)

print(result)

Comments

1

You can use a generator to produce items after the first item of each sublist until an item with no comma is found:

def until_no_comma(seq):
    for i in seq:
        yield i
        if ',' not in i:
            return
seq = iter(l)
print([[i, *until_no_comma(seq)] for i in seq])

This outputs:

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

Demo: https://ideone.com/VJ4fnW

Comments

0

Alternatively, we can throw groupby from itertools at this problem:

from itertools import groupby

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

groups = []

for key, group in groupby(lst, lambda x: ',' in x):

    if key:
        groups[-1].extend(group)
    else:
        a, *b = group

        if b:
            groups[-1].append(a)
            groups.append(b)
        else:
            if groups:
                groups[-1].append(a)
            else:
                groups.append([a])

print(groups)

Assumes input is in the proper order, just needs to be reformatted.

OUTPUT

% python3 test.py
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
% 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.