I appended information from several Excel files into a single data frame. Each Excel file has the same structure but corresponds to a different city. The city name is always located in the same cell (C2).
How can I extract the city name in each file so that it appears as a column for the corresponding rows in my newly created data frame?
My appended data frame looks like this:
Col1 Col2
40 34
104 108
23 1
43 21
Hence, I can't tell which rows belong to file X or file Y. Ideally, I'd like to have a data frame such as:
Col1 Col2 Col3
City A 40 34
City A 104 108
City B 23 1
City B 43 21
I'm not sure if I should edit/write directly to the Excel files before I append them in order to add the corresponding city column. Or if I should this after or in the process of appending to my data frame.
Any guidance would be great.
Edit: This is my best attempt at reproducing the structure of my Excel sheets. Note the column A and rows 5, 6 and 7 are blank. The city name is located in row 2 column C.
I want to extract the information in rows 8 through 11 and add the city name in cell C3 as a column next to these rows.
ColA ColB ColC ColD ColE ColF ColG
Row1 Type XYZ
Row2 CityName XXX
Row3 CityCode 10
Row4 RYear 13
Row5
Row6
Row7
Row8 Rank Cat. 88 89 90 91
Row9 11 A 111 106 102 101
Row10 12 B 121 144 126 121
Row11 13 C 100 107 100 101
Edit2: Following ALollz's advice, I tried the following code unsuccessfully. I get an error " 'DataFrame' object has no attribute 'ColC' ". Note that files_xlsx is a list that includes all Excel files.
all_data = pd.DataFrame()
for f in files_xlsx:
city_name = pd.read_excel(f, "SheetA", nrows=2).ColC[1]
data = pd.read_excel(f, "SheetA", parse_cols="B:J")
data['col_city'] = city_name
all_data = all_data.append(data,ignore_index=True)
Edit3: Kept trying and finally found something that works. The only issue is that cityname is only set to one row and not the entire column, which is what I want. Any help?
df = pd.DataFrame()
for f in files_xlsx:
city_name = pd.read_excel(f, "Sheet1", nrows=2, parse_cols="C", header=None, skiprows=1, skip_footer=264)
data = pd.read_excel(f, "Sheet1", parse_cols="B:J", header=None, skiprows=8)
data['City'] = city_name
df = df.append(data)
city_name = pd.read_excel('your_file', nrows=2).ColC[1]then you can read skipping the first 8 rows and assign that value to a column.