3

I have got a list of strings like this:

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', 
        '<dialog xyz', 'string', 'more string', 'even more string etc']

I need to divide the list into sublists of strings, dividing them precisely on '<' character so that every sublist of strings begins with 'dialog xyz'. Sample output:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog 
  xyz', 'string', 'more string', 'even more string etc']]

I already tried list comprehension but it does not work (returns the same org_list):

divided_list = [s.split(',') for s in ','.join(org_list).split('<')]

I know it is possible with itertools (saw it in some answers) but I am still a beginner, don't understand them much and would like to solve this with what I do understand, if possible.

8 Answers 8

1

First we can create a list of indexes referring to the positions in org_list where the string at that position starts with a '<'.

We can then iterate through these in a list-comp taking slices between each pair of indexes.

However, at the end, we notice that the last slice must go to the end of org_list, so we must concatenate a list containing the index of one over the end to capture this.

Hopefully you can see how that description translates into the following code.

inds = [i for i, s in enumerate(org_list) if '<' in s] + [len(org_list)]
div_l = [org_list[inds[i]:inds[i+1]] for i in range(len(inds)-1)]

which gives the desired output of:

[['<dialog xyz', 'string', 'more string', 'even more string etc'],
 ['<dialog xyz', 'string', 'more string', 'even more string etc']]
Sign up to request clarification or add additional context in comments.

Comments

1

How about something simple like this:

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', '<dialog xyz', 'string', 'more string', 'even more string etc']
split_lists = [] 
for s in org_list:
  if s == '':
    continue
  if s.startswith('<') or len(split_lists) == 0: 
    split_lists.append([s])
    continue
  split_lists[-1].append(s)

print(split_lists)

Output:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

4 Comments

Do not use if s is ''
is is for object identity, not equality
That's new for me. Personally, I feel is is more readable though.
Well, it is simply incorrect, so readability may count, but correctness is more important. is is only interchangable with == if you are working with singleton's, which are pretty rare in Python, but could include None, module-objects, and class objects.
0

This should work:

split_lists = []
for s in org_list:
    if s.startswith('<') or len(split_lists) == 0:
        split_lists.append([])
    split_lists[-1].append(s)

Here is the result for your input:

>>> split_lists
[[''], ['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

If you want to ignore all the strings before the first string with that starts with '<', like the empty string that is the first element in your org_list, then use this:

split_lists = []
for s in org_list:
    if s.startswith('<'):
        split_lists.append([])
    if len(split_lists) == 0:
        continue
    split_lists[-1].append(s)

Comments

0
org_list = ['', '<dialog xyz', 'ztring', 'more ztring', 'even more string etc', '<dialog xyz', 'string', 'more string', 'even more string etc']

orig = []
start = False

new = []

for item in org_list:
    if item == '<dialog xyz' or item == org_list[-1]:
        if len(new) > 1:
            orig.append(new)
        new = []
        start = True
    if start:
        new.append(item)

print(orig)

This gives me the output that you want.

Comments

0

This might help

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc',
        '<dialog xyz', 'string', 'more string', 'even more string etc']

result = [i.split("|") if i.startswith("<") else ("<"+i).split("|") for i in "|".join(filter(None, org_list)).split("|<")]
print result

Output:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

Comments

0

You can use itertools.groupby:

import itertools
import re
org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', 
    '<dialog xyz', 'string', 'more string', 'even more string etc']
new_list = [list(b) for a, b in itertools.groupby(filter(None, org_list), key=lambda x:bool(re.findall('^\<dialog', x)))]
final_list = [new_list[i]+new_list[i+1] for i in range(0, len(new_list), 2)]

Output:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

Comments

0

Сompetition. Who will make the function more difficult and slower. Be simpler, it is Python.

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc', 
        '<dialog xyz', 'string', '', 'even more string etc' , '<dialog xyz', 'string', 'more string',]

def slicelist (pred, iterable):
    element = []
    alw = False
    for s in iterable:
         sw = s.startswith
         if sw(pred):
                element.append([])
                alw=True
         if alw :        
                element[-1].append(s)
    return element

print slicelist('<', org_list)

If you want to make generator(iterator), you need to change next operators in the above example : return to yield and print slicelist('<', org_list) to print list(slicelist('<', org_list))

Comments

0

You can do something like this:

org_list = ['', '<dialog xyz', 'string', 'more string', 'even more string etc',
        '<dialog xyz', 'string', 'more string', 'even more string etc']



flag=True
sub_list=[]
final_list=[]
text='<dialog xyz'
for i in org_list:
    if i.startswith(text):


        flag=False

        if sub_list:
            sub_list.insert(0,text)
            final_list.append(sub_list)

            sub_list=[]

    else:
        if flag==False:



            sub_list.append(i)
sub_list.insert(0,text)
final_list.append(sub_list)
print(final_list)

output:

[['<dialog xyz', 'string', 'more string', 'even more string etc'], ['<dialog xyz', 'string', 'more string', 'even more string etc']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.