0

This is the code I have now

csv_final = [['', 'appeltaart', 'appelstruif', 'amandelbeschuit', 'brood'], ['appel', 3.0, 4.0, 0.0, 0.0], ['gaar', 2.0, 2.0, 0.0, 1.0], ['schotel', 2.0, 4.0, 0.0, 0.0],
['amandel', 0.0, 0.0, 4.0, 0.0],
['deeg', 1.0, 0.0, 2.0, 5.0], ['brood', 0.0, 0.0, 0.0, 1.0], ['suiker', 0.0, 2.0, 2.0, 0.0]]

query = ["appel", "deeg"]

def up_part(query, csv_final, document_vector_lijst):
matrix = []
for j in range(1, len(csv_final)):
    product = 0
    if csv_final[j][0] in query:
        for i in range(1, len(csv_final[0])):
            product += csv_final[j][i]
    matrix.append(product)

return matrix

The output I need is the total of each column, but only for the rows that are in the query. The expected output:

[4.0, 4.0, 2.0, 5.0]

The output I get right now:

[7.0, 0, 0, 0, 8.0, 0, 0]

Does someone have a clue on how to fix this because I am lost. We are not allowed to use libraries like NumPy to do this.

2 Answers 2

1

The problem is that you're iterating on the lines of your csv, and appending an entry to matrix for each line. So what you're computing is the sum of values per line rather than per column.

What you need to do is create a results list of the proper width with all cells initialised to 0, then increment each column's total in-place:

def up_part(query, csv_final, document_vector_lijst):
    results = [0]*(len(csv_final[0])-1)
    for row in csv_final[1:]:
        if row[0] not in query:
            continue
        for i, cell in enumerate(row[0][1:]):
            results[i] += cell
    return results

You could also use a less imperative approach but Python is not greatly suited to that approach:

def up_part(query, csv_final, document_vector_lijst):
    return functools.reduce(
        lambda x, y: map(operator.add, x, y), (
        row[1:] for row in csv_final[1:]
        if row[0] in query
    ))
Sign up to request clarification or add additional context in comments.

Comments

0

I agree with everything Masklinn said and probably has a better programmatic solution. Nevertheless here is my solution:

def up_part(query, csv_final, document_vector_lijst):
    results = [0]*4
    for row in csv_final[1:]:
        i = 0
        if row[0] in query:
            for column in row[1:]:
                results[i] += row[i + 1]
                i += 1
    return results

This solution is going to be less flexible and could probably be refactored.

5 Comments

Do you have any idea on how to make the length of results dynamic? results = [0]*len(csv_final[0])-1 gives an TypeError.
try this, still not perfect though: results = [0]*(len(csv_final[0])-1)
@ArthurDeVries It's missing a pair of parenthesis, it should be [0] * (len(csv_final[0]) - 1) so we create a list with as many entries as the first row minus one, but the original code would create as many entries as there are in the first row then try to subtract 1 from the list which makes no sense. Sorry about that.
@Masklinn, thanks for that. Initially this made little sense to me but after getting to a similar solution it all made sense. Simple mistake :)
@Masklinn and Tom Burness Thanks both, it works now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.