1

I am currently trying to write an algorithm that reads in a CSV file, gets a list of names from column 0, a list of hours worked from column 6, then iterate through a list of staff names, if a name is equal to the current name, grab the relevant hours, add them to a total, once the name is fully checked in the whole file, the name is removed from the list and the next name gets checked.

main.py

from modules import gui_module as gui
from modules import csv_operations as csvcalc

if __name__ == "__main__":

    programWindow = gui.programGUI("CSV Calculator", 200, 150)
    programWindow.run()

    finalStaffHours = csvcalc.calculate_csv_data(gui.csv_file_path, 'staff', 'shift time ex. break (hr)') # Send the CSV file path to the iteration module.
    print (finalStaffHours)

csv_operations.py

import pandas as pd
import csv

def calculate_csv_data(csv_file_path, name_column, hours_column):
    """ Take CSV File, remove header row and find the length of said file."""

    dataFrame = pd.read_csv(csv_file_path)
    file_length = len(dataFrame) # Find length of file

    uncheckedNames = get_name_column(dataFrame, 'staff') # Get list of staff names - contains duplicates
    allHoursWorked = get_hours_column(dataFrame, 'shift time ex. break (hr)')

    finalStaffHours = calculate_staff_hours(allHoursWorked, uncheckedNames, int(file_length))
    return finalStaffHours

def calculate_staff_hours(all_hours_worked, unchecked_names, file_length):
    """ Calculates each staff members hours then returns the list to main."""
    staffHoursAndName = []
    currentHourTotal = 0.0
    uncheckedNamesRemovedDupe = list(set(unchecked_names))
    while uncheckedNamesRemovedDupe:
        currentNameToCheck = uncheckedNamesRemovedDupe[0]
        for i in range(file_length):
            if i == (file_length + 1):
                hourAndNameCombined = currentNameToCheck + str(currentHourTotal)
                staffHoursAndName.append(hourAndNameCombined)
                currentHourTotal = 0.0
                del uncheckedNamesRemovedDupe[0]
            if currentNameToCheck != unchecked_names[i]:
                 continue
            if currentNameToCheck == unchecked_names[i]:
                 currentHourTotal += all_hours_worked[i]
                 print(currentHourTotal)
    if not uncheckedNamesRemovedDupe:
        return staffHoursAndName

def get_hours_column(csv_file_path, hours_column):
    """ Pull all of the hours worked from column 6 and return that. """
    if hours_column in csv_file_path.columns:
        return csv_file_path[hours_column].tolist()
    
def get_name_column(csv_file_path, name_column):
    """ Pull all of the staff names from column 0 CSV and return them. """
    if name_column in csv_file_path.columns:
        return csv_file_path[name_column].tolist()
type here

I can read in the CSV, get the list from column 0 and column 6 - the problem lies in the calculate_staff_hours function in csv_operations.py

It should add the numbers together for one person, once done then remove that person from the list, move on to the next and so on.

It adds the numbers from the CSV file in a never ending loop.

7
  • 2
    If this is not a homework question, pandas will be able to read in your csv, and sum up the hours worked for each name using .groupby() in a couple of lines of code Commented Sep 18 at 6:43
  • It's not a homework question, I didn't know this about pandas and it has blown my mind! Thank you, saved hours of work Commented Sep 18 at 8:39
  • I've written a solution with Pandas :) Commented Sep 18 at 9:27
  • Maybe first use print() (and print(type(...)), print(len(...)), etc.) to see which part of code is executed and what you really have in variables. It is called "print debugging" and it helps to see what code is really doing. Commented Sep 18 at 12:15
  • some of your code seems too complicated. Why to create get_name_column and get_hours_column to get column. They do the same. Besides if column doesn't exists then it will return None but you never check if you get column or None. So you could at once run csv_file_path[staff'].tolist() without functions. It could make code shorter and more readable. Commented Sep 18 at 12:18

3 Answers 3

5

i == (file_length + 1) will never be True. Therefore the first element of uncheckedNamesRemovedDupe will never be deleted.

Thus, providing uncheckedNamesRemovedDupe was a non-empty list before entering the while loop, your code will run ad infinitum

Sign up to request clarification or add additional context in comments.

1 Comment

I don't know how I didn't notice that it will never reach file_length +1 - thank you for that.
3

(As discussed in the comments) you can do this really easily using Pandas, avoiding complicated logic.

Use .read_csv() to read in your file, and .groupby() to sum up the hours:

import pandas
df = pd.read_csv('filename.csv')
df.groupby('Name Column')['Hours'].sum()

As an example:

df = pd.DataFrame({'Name Column':['Jim', 'Sarah', 'Jim', 'Judy', 'Jim', 'Sarah',   'Sarah', 'Judy'],
'Hours':[5, 7, 3, 2, 6, 8, 8, 4]})

  Name Column  Hours
0         Jim      5
1       Sarah      7
2         Jim      3
3        Judy      2
4         Jim      6
5       Sarah      8
6       Sarah      8
7        Judy      4
df.groupby('Name Column')['Hours'].sum()

Name Column
Jim      14
Judy      6
Sarah    23

2 Comments

Thank you for the solution, after the previous comment I did some reading and came to the same sort of answer! Thank you, groupby() has saved 30+ lines of writing!
it this answer helped you then you could mark this answer as accepted. it will be also information for others that problem was solved.
1

This was how I solved this!

import pandas as pd

def calculate_csv_data(csv_file_path, name_column, hours_column):
    """ Take CSV File, Staff Columnm and Hours Column, returns each staff member with their total hours. """
    dataFrame = pd.read_csv(csv_file_path)
    groupedData = dataFrame.groupby(['staff','shift time ex. break (hr)']).sum()
    finalData = dataFrame.groupby(['staff'], sort = True)['shift time ex. break (hr)'].sum()
    return finalData

1 Comment

you create groupedData but later you never use it - so you could remove this line.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.