0

I have a list with many elements that I have extracted from an html page using Beautiful Soup. Within this list I have many elements with the same substring, and I would like to extract every element that contains that substring.

My list looks like:

[
u'File:Saddam Hussein (107).jpg',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'Template:Fn (page does not exist)',
u'File:AlBakr.jpg',
... (and so on) ...
]

And I would like to delete and element that has the string "(page does not exist)".

Any thoughts on how I could do this?

2 Answers 2

2

Use a list comprehension:

>>> lis = [u'File:Saddam Hussein (107).jpg', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'File:AlBakr.jpg', u'Template:Fn (page does not exist)', u'File:Chiracsaddam.jpg', u'File:Donald saddam.jpg', u'Template:Fn (page does not exist)', u'File:SaddamandCuellar.jpg.jpg', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'File:SaddamBaghdadwalkabout.jpg', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fn (page does not exist)', u'Kurdish Patriotic Front (page does not exist)', u'File:TrialSaddam.jpg', u'Mohammad Rashdan (page does not exist)', u'Emmanuel Ludot (page does not exist)', u'Marc Henzelin (page does not exist)', u'Adnan Khairallah Tuffah (page does not exist)', u'Nidal al-Hamdani (page does not exist)', u'Ali Hussein (page does not exist)', u'File:SaddamandRana.jpg.jpg', u'Saddam Kamel Majid (page does not exist)', u'Template:Fn (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)', u'Template:Fnb (page does not exist)']

If you want to modify the original list:

>>> lis[:] = [item for item in lis if "(page does not exist)" not in item]

Or to create a new list:

new_lis = [item for item in lis if "(page does not exist)" not in item]
Sign up to request clarification or add additional context in comments.

5 Comments

Why the copy, [:]? I'm pretty sure that's unnecessary.
@johnthexiii lis[:] is not a copy, see stackoverflow.com/questions/11297774/…
@AshwiniChaudhary, perhaps a better question is why keep the original reference? I am not implying that it is a bad thing, I'm just curios.
@johnthexiii OP mentioned "would like to delete", so I provided both alternatives. That's the only reason.
@johnthexiii: sometimes the change should be inplace e.g., os.walk() allows to manipulate what directories are visited by changing dirs list.
0
>>> for i in range(len(l)-1, 0, -1):
...    if l[i].find('(page does not exist)') > -1:
...       del (l[i])
...
>>> l
[u'File:Saddam Hussein (107).jpg']
>>>

2 Comments

del l[i]- you don't need the parentheses. Also L is a better variable name than l.
Note that del and pop are expensive operations for lists.(pop is slightly faster than del)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.