Create nested lists based on split of characters

Question

I have a list made by strings, correctly cleaned (split(',') can be safely used), and correctly sorted depending on numbers. As a small example:

l = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

What I'm trying to achieve is to create as many sublists that start and end with single strings, that is:

[
    ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
    ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
    ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]

I thought to add some logic like the following code, but I'm not sure if I'm on the correct way:

tl = []

for i in l:
    
    # just get the variable
    val = i
    
    tl.append(val)
    
    # split by ,
    val_split = len(i.split(','))  
    
    # check if the value is the first element of the list (C1)
    if val == l[0]:
        print(1, val)
    # check if the split of the character is longer than 2 (C1,C2)
    elif val_split > 1:
        print(2, val)
    # check is the split of the character siis equalt to 1 (C4)
    elif val_split == 1:
        # here the code should compare if the character is equal to the last value of the nested list. If yes go with teh next value (C5)
        if val != tl[-1]:
            print(3, val)
        else:
            print(4, val)

Depending on your source data, it might be impossible to achieve your objective - i.e., if there are never an adjacent pair of "single strings". — jackal
– jackal, Commented Feb 24 at 10:36
@Adon Bilivit data are cleaned, if not then the source list will be rejected. So the list will always start with a single string and adjacent pair will always be like in the example (C4, C5) — matteo
– matteo, Commented Feb 24 at 10:40

jackal · Accepted Answer · 2025-02-24 12:14:19Z

1

If the input list is guaranteed to start and end with a single string and if there will always be at least one adjacent pair of single strings then:

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
result = [[]]
for e in lst:
    result[-1].append(e)
    if not "," in e:
        if len(result[-1]) > 1:
            result.append([])
result.pop()
print(result)

Output:

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

edited Feb 24 at 12:14

answered Feb 24 at 10:46

jackal

29.1k3 gold badges9 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tim Biegeleisen · Accepted Answer · 2025-02-24 10:49:26Z

1

Here is my take on this, using regular expressions. We can recombine your starting list using some distinct separator, say |, then use re.findall to find each single C-multi C string.

import re

inp = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
x = '|'.join(inp)
parts = re.findall(r'(?<![^|])C\d+(?:\|(?:C\d+(?:,C\d+)+)+)+\|C\d+(?![^|])', x)
output = [p.split('|') for p in parts] 
print(output)

This prints:

[
    ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'],
    ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'],
    ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']
]

answered Feb 24 at 10:49

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

1 Comment

jackal Feb 24 at 11:59

Is it not reasonable to assume that the OP implicitly defines a "single string" as being any string that does not contain a comma? If that's the case, then this rather esoteric approach isn't helpful.

ThomasIsCoding · Accepted Answer · 2025-02-24 11:24:32Z

1

Given data s like below

s = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

you can try itertools along with numpy

import numpy as np
import itertools
grp = np.ceil(np.cumsum(np.char.count(s, ',')==0)/2)
[list(g) for k, g in itertools.groupby(s, lambda i: grp[s.index(i)])]

or without numpy

from itertools import accumulate, groupby
from math import ceil

grp = [ceil(x/2) for x in accumulate(map(lambda x: int(x.count(',')==0), s))]
[list(g) for k, g in groupby(s, lambda i: grp[s.index(i)])]

such that you will obtain

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

edited Feb 24 at 11:24

answered Feb 24 at 10:45

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

1 Comment

Tim Biegeleisen Feb 24 at 11:17

Great use of numpy for a base Python use case.

no comment · Accepted Answer · 2025-02-24 11:41:27Z

1

With split_when from more-itertools:

from more_itertools import split_when

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

result = list(split_when(lst, lambda s, t: ',' not in s+t))

print(result)

Or just basic:

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

result = []
it = iter(lst)
for s in it:
    sub = [s]
    for t in it:
        sub.append(t)
        if ',' not in t:
            break
    result.append(sub)

print(result)

edited Feb 24 at 11:41

answered Feb 24 at 11:32

no comment

10.9k5 gold badges21 silver badges44 bronze badges

Comments

blhsing · Accepted Answer · 2025-02-25 07:49:19Z

1

You can use a generator to produce items after the first item of each sublist until an item with no comma is found:

def until_no_comma(seq):
    for i in seq:
        yield i
        if ',' not in i:
            return
seq = iter(l)
print([[i, *until_no_comma(seq)] for i in seq])

This outputs:

[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]

Demo: https://ideone.com/VJ4fnW

edited Feb 25 at 7:49

answered Feb 25 at 7:10

blhsing

109k9 gold badges88 silver badges132 bronze badges

Comments

cdlane · Accepted Answer · 2025-02-26 18:36:04Z

Alternatively, we can throw groupby from itertools at this problem:

from itertools import groupby

lst = ['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4', 'C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8', 'C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']

groups = []

for key, group in groupby(lst, lambda x: ',' in x):

    if key:
        groups[-1].extend(group)
    else:
        a, *b = group

        if b:
            groups[-1].append(a)
            groups.append(b)
        else:
            if groups:
                groups[-1].append(a)
            else:
                groups.append([a])

print(groups)

Assumes input is in the proper order, just needs to be reformatted.

OUTPUT

% python3 test.py
[['C1', 'C1,C2', 'C2,C3', 'C3,C4', 'C4'], ['C5', 'C5,C6', 'C6,C7', 'C7,C8', 'C8'], ['C10', 'C10,C11', 'C11,C12', 'C12,C13', 'C13']]
%

Collectives™ on Stack Overflow

Create nested lists based on split of characters

6 Answers 6

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related