How to format/parse a text file into a CSV using Python (pandas)

Question

I want to read a text file that contains test results in a single column fashion (each line has one test case) and convert it to a CSV file with multiple columns where the columns are the name of the person who took the test with their results in their column.

The column headers in the CSV file will be: "Matt Test, Mark Test, John Test, Mike Test"

Under each persons column, they will have their results from slowest to fastest time. For example under "Matt Test" he will have 3 rows of trl_matt_test and the six rows of get_trl_time, "Mark Test will have 2 rows of trl_mark_test and 3 rows of get_trl_time etc... the results will generate different number of results each time so I can't hard code the number of rows.

testdata.txt (this is the text file data that I am reading):

trl_matt_test: 15s
trl_matt_test: 10s
trl_matt_test: 12s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mark_test: 13s
trl_mark_test: 20s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_john_test: 20s
trl_john_test: 25s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
get_trl_time: 1s
trl_mike_test: 2s
get_trl_time: 1s
get_trl_time: 1s

# I want to use pandas and data frame if possible
import pandas as pd

# These are the headers I want to use for the columns in the CSV
header_list = ['Matt Test', 'Mark Test', 'John Test', 'Mike Test']

# I want to use a substring of the test name as a delimiter of where to split off   
delimiter_list = ['mark', 'John', 'mike']

# I want to put the row number where the delimiter is to know how many rows
# of data each person has
delimiter_row_nums = []

# the idea behind this is I can know that Matts test are from rows 0-15 and that Marks
# test are from rows 16-20 etc... this
# is just an example but then I can create a list for Matts data [0:15] then a list of 
# Marks data [16:20] etc...    

# read the file in as a CSV using pandas and save the read file to the data_file
data_file = pd.read_csv("testdata.txt", header = header_list)

# use a count to get the row number needed
count = 1

# for each element in the delimiter list
for delim in delimiter_list:
    # for each row or line in the file
    for row in data_file:
        # if an element in delimiter list is a substring of a row/line in the data file 
        if row.find(delim) != -1:

# take the new list and sort them then place them under their respected headers

Jonathan Leon · Accepted Answer · 2021-05-14 17:32:22Z

It's not terribly clear what you are looking for, but this may get you started. I created a txt file with data you provided.

df = pd.read_csv('testdata.txt', header=0, names=['Results'])

# map the tester to the data
dd = df.Results.str.split('_', 1).str[1].str.split(':').str[0]
cmap = {'matt_test': 'Matt Test', 'mark_test': 'Mark Test', 'john_test': 'John Test', 'mike_test': 'Mike Test'}
df['Tester'] = dd.map(cmap).fillna(method='ffill') # not sure here if you want forward or back fill

# re-orient the data
df_pivot = df.pivot(columns=['Tester'])

                       Results
Tester           John Test           Mark Test           Matt Test          Mike Test
0                      NaN                 NaN  trl_matt_test: 10s                NaN
1                      NaN                 NaN  trl_matt_test: 12s                NaN
2                      NaN                 NaN    get_trl_time: 1s                NaN
3                      NaN                 NaN    get_trl_time: 1s                NaN
4                      NaN                 NaN    get_trl_time: 1s                NaN
5                      NaN                 NaN    get_trl_time: 1s                NaN
6                      NaN                 NaN    get_trl_time: 1s                NaN
7                      NaN                 NaN    get_trl_time: 1s                NaN
8                      NaN  trl_mark_test: 13s                 NaN                NaN
9                      NaN  trl_mark_test: 20s                 NaN                NaN
10                     NaN    get_trl_time: 1s                 NaN                NaN
11                     NaN    get_trl_time: 1s                 NaN                NaN
12                     NaN    get_trl_time: 1s                 NaN                NaN
13      trl_john_test: 20s                 NaN                 NaN                NaN
14       trl_john_test:25s                 NaN                 NaN                NaN
15        get_trl_time: 1s                 NaN                 NaN                NaN
16        get_trl_time: 1s                 NaN                 NaN                NaN
17        get_trl_time: 1s                 NaN                 NaN                NaN
18        get_trl_time: 1s                 NaN                 NaN                NaN
19        get_trl_time: 1s                 NaN                 NaN                NaN
20        get_trl_time: 1s                 NaN                 NaN                NaN
21        get_trl_time: 1s                 NaN                 NaN                NaN
22        get_trl_time: 1s                 NaN                 NaN                NaN
23        get_trl_time: 1s                 NaN                 NaN                NaN
24        get_trl_time: 1s                 NaN                 NaN                NaN
25                     NaN                 NaN                 NaN  trl_mike_test: 2s
26                     NaN                 NaN                 NaN   get_trl_time: 1s
27                     NaN                 NaN                 NaN   get_trl_time: 1s



# do a count
df_pivot.count()

         Tester
Results  John Test    12
         Mark Test     5
         Matt Test     8
         Mike Test     3
dtype: int64

This helps a lot, I'm going to look at this more in detail and will get back to you. Thanks for the help!

Collectives™ on Stack Overflow

How to format/parse a text file into a CSV using Python (pandas)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related