Pandas, BeautifulSoup - iterating and writing multiple pages to excel

Question

I am taking a bunch of pages of NCAA soccer stats and dumping them into an excel spreadsheet. However the Win/loss/tie data (WLT) spans multiple pages so I iterate through them. But the WLT only stores the last page (4 schools out of 204) of the iteration into excel. How can I get the 5 pages downloaded in the "WLT" sheet in excel? Thanks for your help....


    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    import re
    import xlsxwriter
    import numpy as np
    import urllib.request


    shutouts = "https://www.ncaa.com/stats/soccer-men/d1/current/team/31"
    shutouts = pd.read_html(shutouts)[0] 

    SOG = 'https://www.ncaa.com/stats/soccer-men/d1/current/team/977'
    SOG = pd.read_html(SOG)[0]

    # players stats
    shutouts_p = 'https://www.ncaa.com/stats/soccer-men/d1/current/individual/1170'
    shutouts_p = pd.read_html(shutouts_p)[0]

    #Win Loss Tie data
    max_page_num = 6
    for i in range(1,max_page_num):  
        print('page:', i)
        page_num = str(i)
        source = "https://www.ncaa.com/stats/soccer-men/d1/current/team/33/p" + page_num
        WLT = pd.read_html(source)
        WLT = WLT[0]


    with pd.ExcelWriter('ncaastats.xlsx') as writer:  
        shutouts.to_excel(writer, sheet_name='shutouts')
        shutouts_p.to_excel(writer, sheet_name='shutouts_p')
        SOG.to_excel(writer, sheet_name='SOG')
        WLT.to_excel(writer, sheet_name='WLT')

KunduK · Accepted Answer · 2020-01-12 09:57:29Z

To get all 204 records from 5 pages in pandas dataframe. you need to append the df in each iteration

Code:

import pandas as pd

#declare df here
df=pd.DataFrame()
#Win Loss Tie data
max_page_num = 6
for i in range(1,max_page_num):
    print('page:', i)
    page_num = str(i)
    source = "https://www.ncaa.com/stats/soccer-men/d1/current/team/33/p" + page_num
    WLT = pd.read_html(source)[0]
    #Append df here
    df = df.append(WLT, ignore_index=True)

print(df)

Output:

page: 1
page: 2
page: 3
page: 4
page: 5
    Rank                Team  Won  Loss  Tied   Pct.
0      1        Missouri St.   18     1     1  0.925
1      2          Georgetown   20     1     3  0.896
2      -            Virginia   21     2     1  0.896
3      4   Saint Mary's (CA)   16     2     0  0.889
4      5                 SMU   18     2     1  0.881
5      6             Clemson   18     2     2  0.864
6      7       New Hampshire   15     2     3  0.825
7      8            Campbell   17     3     2  0.818
8      9          Washington   17     4     0  0.810
9     10                 UCF   15     3     2  0.800
10    11            Marshall   16     3     3  0.795
11    12           Seattle U   16     3     4  0.783
12    13                Yale   13     3     2  0.778
13    14             Indiana   15     3     4  0.773
14    15        Oral Roberts   13     4     0  0.765
15    16            Stanford   14     3     5  0.750
16    17         Wake Forest   16     5     2  0.739
17    18        Rhode Island   14     4     3  0.738
18    19                Navy   12     4     1  0.735
19    20     St. John's (NY)   14     5     1  0.725
20    21                 UIC   13     5     0  0.722
21    22            Penn St.   12     4     3  0.711
22    23    UC Santa Barbara   15     5     4  0.708
23    24            UC Davis   13     5     2  0.700
24     -           Charlotte   12     4     4  0.700
25     -         Georgia St.   12     4     4  0.700
26    27          Providence   16     7     0  0.696
27    28           San Diego   12     5     1  0.694
28     -                 FIU   10     3     5  0.694
29    30                Iona   14     6     1  0.690
..   ...                 ...  ...   ...   ...    ...
174  175            Delaware    3     9     3  0.300
175  176         USC Upstate    5    12     0  0.294
176    -       Robert Morris    4    11     2  0.294
177    -         Stony Brook    4    11     2  0.294
178    -                 UIW    5    12     0  0.294
179  180        Western Ill.    5    13     1  0.289
180  181           Wisconsin    3    11     4  0.278
181    -             Liberty    5    13     0  0.278
182    -       San Diego St.    4    12     2  0.278
183  184           Boston U.    4    12     1  0.265
184    -       UNC Asheville    4    12     1  0.265
185  186             Wofford    4    13     1  0.250
186    -          Valparaiso    4    13     1  0.250
187    -            American    3    11     2  0.250
188    -        George Mason    4    13     1  0.250
189    -            Davidson    3    11     2  0.250
190    -        Michigan St.    3    12     3  0.250
191  192            Monmouth    3    12     2  0.235
192    -                 UAB    3    12     2  0.235
193  194        Old Dominion    3    11     1  0.233
194  195        Sacred Heart    2    11     3  0.219
195  196  Col. of Charleston    2    12     2  0.188
196  197          Holy Cross    3    15     0  0.167
197    -   Purdue Fort Wayne    3    15     0  0.167
198  199       San Francisco    2    14     1  0.147
199  200          Evansville    2    15     1  0.139
200  201            Canisius    2    15     0  0.118
201  202   Central Conn. St.    1    13     1  0.100
202  203                 VMI    1    16     0  0.059
203  204             Harvard    0    14     1  0.033

[204 rows x 6 columns]

Collectives™ on Stack Overflow

Pandas, BeautifulSoup - iterating and writing multiple pages to excel

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related