I want to get data row by row in a python list from an excel file. For example, my excel file contains multiple rows of data and the first element of my python list will be a list that will contain all the information of the first row, the second element of the list will be a list that will contain the information of the second row of the excel file and so on. Can anybody teach me the easiest way of doing that? Thank you :)
4 Answers
If you are already using pandas, then this is relatively straightforward:
import pandas as pd
df = pd.read_excel('book1.xlsx',engine='openpyxl',dtype=object,header=None)
print(df.head())
l = df.values.tolist()
print(l)
NB. You may have to pip install openpyxl if it is not already in your packages.
Pandas read_excel documentation
EDIT: You don't really need the engine and dtype parameters: pandas defaults to openpyxl if you specify ".xlsx", and you can let pandas handle the types in most circumstances.
The header=None is important though, otherwise pandas will interpret the first row of your Excel sheet as the dataframe column names.
Comments
The easiest way would be saving that excel file into a CSV and then loading it. The proprietary excel files would be hard to decode. Use Save as... and the option CSV file.
Then, do this:
filename = ...
rows = []
with open(filename, "r") as f:
for line in f:
rows.append(line.split(","))
print(rows)
The advantage of this approach is that you need no external libraries to do this. It uses basic Python only.
2 Comments
This will give you list of list -
import pandas as pd
df = pd.read_excel('filename.xlsx')
l1 = []
for index, row in df.iterrows():
l1.append(row.to_list())
2 Comments
Reading a given column of a sheet in a Workbook can be done like this:
# required packages: pandas, openpyxl
#--- import section -------------------------------------------------
import pandas
#--- create file variable: ------------------------------------------
my_excel_file = "my_file.xlsx"
#--- create lists: --------------------------------------------------
df = pandas.read_excel(my_excel_file, sheet_name='my_sheet')
# with header:
my_list = (list)(df["my_header"].values)
# without header:
my_second_list = (list)(df[df.columns[1]].values)
#--- process list: --------------------------------------------------
print(len(my_list))
for my_item in my_list:
print(my_item, end = "; ")