0

enter image description here

I am trying to move to the next page till the 'next' button exists at this link 'https://www.cbp.gov/contact/find-broker-by-port/4901?page=1'. I realized that the requests response doesn't have the button in it hence BeautifulSoup cannot find it. I tried adding headers/user-agent to requests but the element still doesn't appear. As far as I can tell, there is no Javascript generating content on this page. Here is the code. What am I missing?

def second_links(second_links_list=[], page2_num=0):
  try:
    with open('port.csv', 'r') as read_obj:
      csv_reader = reader(read_obj)
      for row in csv_reader:
        row = row[-1]
        page2 = requests.get(row.format(page2_num))
        soup2 = BeautifulSoup(page2.content, 'html')
        results2 = soup2.find(id='region-content')
        table2cells = results2.find_all('td', class_='views-field views-field-title views-align-center')
        for cell in table2cells:
          cell2link = cell.find('a', href=True)
          second_links_list.append('https://www.cbp.gov'+cell2link['href'])

      next2_page = results2.find('li', class_='pager-next')
      if next2_page:
        page2_num += 1
        second_links(second_links_list, page2_num)
    return second_links_list
  except requests.exceptions.ConnectionError:
    page2.status_code = 'connection refused'

1 Answer 1

1
import requests
import pandas as pd


def main(url):
    with requests.Session() as req:
        allin = []
        for item in range(3):
            r = req.get(url.format(item))
            df = pd.read_html(r.content)[0]
            allin.append(df)
        new = pd.concat(allin)
        print(new)
        new.to_csv("data.csv", index=False)


main("https://www.cbp.gov/contact/find-broker-by-port/4901?page={}")
                                         Broker Name Broker Filer Code
0                                    AXIOM TRADE INC               BTL
1                      DE LA CRUZ CUSTOMS BROKER INC               ENM
2                          ECI CUSTOMS BROKERAGE INC               BGZ
3                                   EDWIN SEDA PEREZ               9JD
4                 EXPEDITORS INT'L (PUERTO RICO) INC               ES9
5                                     GRISEL PADILLA               MU8
6                    INTEGRITY CUSTOMS BROKERAGE LLC               9QB
7                    INTER-WORLD CUSTOMS BROKERS INC               N35
8                               JAIME MADURO SANTANA               ALA
9                                      JOSE G FLORES               256
0                                JOSE M RAMOS GARCIA               97Q
1                                    JOSE R BERMUDEZ               9HD
2                                        JUAN GARCIA               9ST
3                   JULIO CACERES DBA TRADEWORKS INC               97D
4                          JULIO RODRIGUEZ USCB CORP               EWV
5                                     MANUEL A RAMOS               G68
6                            MANUEL RAMOS-GANDIA INC               CDX
7                                   NESTOR REYES INC               508
8                               NORBERTO DAVID COLON               BLC
9                  P R INTERNATIONAL CUSTOMS BROKERS               D05
0                                      PANALPINA INC               554
1                                PEDRO L CARMONA INC               BWV
2                           PEDRO L SITIRICHE-TORRES               E9T
3  RADIX GROUP INTERNATIONAL INC DBA DHL GLOBAL F...               336
4                   RANK SHIPPING OF PUERTO RICO INC               D84
5                           RENE ORTIZ-VILLAFANE INC               438
6                         ROSA MARINA FLORES-ALVAREZ               NZ5
7                     UPS SUPPLY CHAIN SOLUTIONS INC               UPS
Sign up to request clarification or add additional context in comments.

1 Comment

Well, This works. So, I will accept it. However, In this case I will have to know how many 'next' pages are there. Only then can I run the loops. The previous page has 200 links which bring to a page like this..This page can have only 1 page or 25 in some cases. That's the reason I did not use selenium(would have taken forever). Given that the question doesn't mention all this. Thanks for your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.