Calculating column totals of an array - Python

Question

This is the code I have now

csv_final = [['', 'appeltaart', 'appelstruif', 'amandelbeschuit', 'brood'], ['appel', 3.0, 4.0, 0.0, 0.0], ['gaar', 2.0, 2.0, 0.0, 1.0], ['schotel', 2.0, 4.0, 0.0, 0.0],
['amandel', 0.0, 0.0, 4.0, 0.0],
['deeg', 1.0, 0.0, 2.0, 5.0], ['brood', 0.0, 0.0, 0.0, 1.0], ['suiker', 0.0, 2.0, 2.0, 0.0]]

query = ["appel", "deeg"]

def up_part(query, csv_final, document_vector_lijst):
matrix = []
for j in range(1, len(csv_final)):
    product = 0
    if csv_final[j][0] in query:
        for i in range(1, len(csv_final[0])):
            product += csv_final[j][i]
    matrix.append(product)

return matrix

The output I need is the total of each column, but only for the rows that are in the query. The expected output:

[4.0, 4.0, 2.0, 5.0]

The output I get right now:

[7.0, 0, 0, 0, 8.0, 0, 0]

Does someone have a clue on how to fix this because I am lost. We are not allowed to use libraries like NumPy to do this.

Masklinn · Accepted Answer · 2020-01-29 15:01:26Z

1

The problem is that you're iterating on the lines of your csv, and appending an entry to matrix for each line. So what you're computing is the sum of values per line rather than per column.

What you need to do is create a results list of the proper width with all cells initialised to 0, then increment each column's total in-place:

def up_part(query, csv_final, document_vector_lijst):
    results = [0]*(len(csv_final[0])-1)
    for row in csv_final[1:]:
        if row[0] not in query:
            continue
        for i, cell in enumerate(row[0][1:]):
            results[i] += cell
    return results

You could also use a less imperative approach but Python is not greatly suited to that approach:

def up_part(query, csv_final, document_vector_lijst):
    return functools.reduce(
        lambda x, y: map(operator.add, x, y), (
        row[1:] for row in csv_final[1:]
        if row[0] in query
    ))

edited Jan 29, 2020 at 15:01

answered Jan 29, 2020 at 14:17

Masklinn

43.7k4 gold badges58 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tom Burness · Accepted Answer · 2020-01-29 14:59:54Z

0

I agree with everything Masklinn said and probably has a better programmatic solution. Nevertheless here is my solution:

def up_part(query, csv_final, document_vector_lijst):
    results = [0]*4
    for row in csv_final[1:]:
        i = 0
        if row[0] in query:
            for column in row[1:]:
                results[i] += row[i + 1]
                i += 1
    return results

This solution is going to be less flexible and could probably be refactored.

edited Jan 29, 2020 at 14:59

answered Jan 29, 2020 at 14:48

Tom Burness

513 bronze badges

5 Comments

Arthur De Vries Over a year ago

Do you have any idea on how to make the length of results dynamic? results = [0]*len(csv_final[0])-1 gives an TypeError.

Tom Burness Over a year ago

try this, still not perfect though: results = [0]*(len(csv_final[0])-1)

Masklinn Over a year ago

@ArthurDeVries It's missing a pair of parenthesis, it should be [0] * (len(csv_final[0]) - 1) so we create a list with as many entries as the first row minus one, but the original code would create as many entries as there are in the first row then try to subtract 1 from the list which makes no sense. Sorry about that.

Tom Burness Over a year ago

@Masklinn, thanks for that. Initially this made little sense to me but after getting to a similar solution it all made sense. Simple mistake :)

Arthur De Vries Over a year ago

@Masklinn and Tom Burness Thanks both, it works now!

Collectives™ on Stack Overflow

Calculating column totals of an array - Python

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related