1

I've been writing a function that scrapes posts from the website www.meh.ro. I want it to pull a random post from a random page, but with the way I've built it it scrapes ALL posts by iterating over the html with a for loop, and I just need to return the output from a single post. I've been searching around and breaking my head over a simple solution, but I've got writers block I suppose. I was hoping someone might have a brilliant idea I'm missing.

My code:

from random import randint
from urllib import urlopen
# from urllib import urlretrieve
from bs4 import BeautifulSoup


hit = False
while hit == False:
    link = 'http://www.meh.ro/page/' + str(randint(1, 1000))
    print link, '\n---\n\n'

    try:
        source = urlopen(link).read()
        soup = BeautifulSoup(source)

        for tag in soup.find_all('div'):
            try:
                if tag['class'][1] == 'post':
                    # print tag.prettify('utf-8'), '\n\n'
                    title = tag.h2.a.string
                    imageURL = tag.p.a['href']
                    sourceURL = tag.div.a['href'].split('#')[0]

                    print title
                    print imageURL
                    print sourceURL
                    print '\n'
                    hit = True

            except Exception, e:
                if type(e) != 'exceptions.IndexError' or 'exceptions.KeyError':
                    print 'try2: ',type(e), '\n', e

    except Exception, e:
            print 'try1: ',type(e), '\n', e

I considered doing it based on an idea I used elsewhere in my code to set the chance a specific entry was chosen, which was to add elements n times to a list in order to increase or decrease the chance of them being pulled from it:

def content_image():
    l = []
    l.extend(['imgur()' for i in range(90)])
    l.extend(['explosm()' for i in range(10)])

    return eval(l[randint(0, len(l)-1)])
    return out

It would work, but I'm asking around regardless because I'm sure someone more experience than me can work out a better solution.

1 Answer 1

1

To pick one post at random, you still have to loop through all of them and collect them in a list:

import random

posts = []
for tag in soup.find_all('div', class_='post'):
    title = tag.h2.a.string
    imageURL = tag.p.a['href']
    sourceURL = tag.div.a['href'].split('#', 1)[0]

    posts.append((title, imageURL, sourceURL))

title, imageURL, sourceURL = random.choice(posts)

This code collects all posts (title, image url, source url) into a list, then use random.choice() to pick a random entry from that list.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah, I figured as much. Didn't know about random.choice though, that makes things a lot cleaner than how I previously solved it. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.