3

I want to get data row by row in a python list from an excel file. For example, my excel file contains multiple rows of data and the first element of my python list will be a list that will contain all the information of the first row, the second element of the list will be a list that will contain the information of the second row of the excel file and so on. Can anybody teach me the easiest way of doing that? Thank you :)

1

4 Answers 4

3

If you are already using pandas, then this is relatively straightforward:

import pandas as pd

df = pd.read_excel('book1.xlsx',engine='openpyxl',dtype=object,header=None)

print(df.head())

l = df.values.tolist()

print(l)

NB. You may have to pip install openpyxl if it is not already in your packages.

Pandas read_excel documentation

EDIT: You don't really need the engine and dtype parameters: pandas defaults to openpyxl if you specify ".xlsx", and you can let pandas handle the types in most circumstances.

The header=None is important though, otherwise pandas will interpret the first row of your Excel sheet as the dataframe column names.

Sign up to request clarification or add additional context in comments.

Comments

1

The easiest way would be saving that excel file into a CSV and then loading it. The proprietary excel files would be hard to decode. Use Save as... and the option CSV file.

Then, do this:

filename = ...
rows = []
with open(filename, "r") as f:
    for line in f:
        rows.append(line.split(","))
print(rows)
    

The advantage of this approach is that you need no external libraries to do this. It uses basic Python only.

2 Comments

Thank you but I am restricted to the excel file.
@MR.JD Unlucky. Pandas would be your best option then I guess.
1

This will give you list of list -

import pandas as pd

df = pd.read_excel('filename.xlsx')

l1 = []

for index, row in df.iterrows():
    l1.append(row.to_list())

2 Comments

what is 'test' here ?
@MR.JD It is most likely df.iterrows()
0

Reading a given column of a sheet in a Workbook can be done like this:

    # required packages: pandas, openpyxl

    #--- import section -------------------------------------------------
    import pandas 

    #--- create file variable: ------------------------------------------

    my_excel_file = "my_file.xlsx"

    #--- create lists: --------------------------------------------------
    df = pandas.read_excel(my_excel_file, sheet_name='my_sheet')

    # with header:
    my_list = (list)(df["my_header"].values)

    # without header:
    my_second_list = (list)(df[df.columns[1]].values)

    #--- process list: --------------------------------------------------
    print(len(my_list))
    for my_item in my_list:
            print(my_item, end = "; ")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.