0

I an using a function (movies_from_url) to read movies total 256 from a webpage. Each page contains 50 movies. I have to read first 6 pages for this (5 pages for 250 movies and 6th page for 6 movies).

first url:

http://www.imdb.com/search/title?at=0&sort=user_rating&start=1&title_type=feature&year=2005,2014

Here is my vague idea:

def read_m_by_rating(first_year=2005, last_year=2015, top_number=256):
    current_index=1   # current index is start number  of a webpage 
    final_list = []
    for _ in xrange(6):
    url = http://www.imdb.com/search/title?at=0&sort=user_rating&start=current_index&title_type=feature&year=2005,2014
    if top_number==300:
         lis = movies_from_url(url, top_number - current_index + 1)
    else:
         lis = movies_from_url(url, 50)

    final_list.append(lis)
    current_index=+50
    return final_list
8
  • 2
    Which difficulty are you having? Strange code, btw. Try yourself and then ask. We're not here to write full programs for you. Commented Feb 9, 2015 at 16:39
  • @ ForceBru, to create each urls. Commented Feb 9, 2015 at 16:42
  • you're talking about for loop here to create url: ? Commented Feb 9, 2015 at 16:43
  • I think it's a good question. He did provide pseudo code that proves he did some thinking. My suggestion to you is to try and break this into challanges one by one. For now just try and master for loops. You may want to google "loop comprehension". (leave aside the specfics of dynamic-content crawling for now). Commented Feb 9, 2015 at 16:46
  • 1
    Just loop through start as this: for o in xrange(20): a_url="http://url.com/?bla=23&start="+str(o)+"&blabla=32" and use a_url then Commented Feb 9, 2015 at 16:47

1 Answer 1

1

Just using a simple loop over current_index should work.

while current_index<256:
    url = "http://www.imdb.com/search/title?at=0&sort=user_rating&start="\
    +str(current_index)+"&title_type=feature&year=2005,2014"
    ...
    ...
    current_index+=50
return final_list
Sign up to request clarification or add additional context in comments.

2 Comments

@ Ayush Gupta, This thread stackoverflow.com/questions/818828/… talks about doing something x times.
You need to modify your movies_from_url() so that you get the number of items in "lis" and end the while loop on that basis.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.