I'm new to Python, and I'm having a lot of trouble with this problem, it's something I have to do for work.
Some background about the excel file: There are 3 columns, and about 100 rows. The first column (col1) contain either A or B. The second column (col2) contain any number ranging from 1 to 10. The third column (col3) contains a value of any decimal number.
What I want the program to do is parse through the data. There will be many duplicates of col1 and col2 put together. For example, (A, 1) can be on rows 1, 5, 20, 98, etc. But col3 will be different numbers. So for those different numbers from the 3rd column, I want it to find the average of all those numbers.
The output should look something like this:
A, 1 = avg 4.32
A, 2 = avg 7.23
A, 3 = avg -9.12
etc etc (until number 10)
B, 1 = avg 3.76
B, 2 = avg -8.12
B, 3 = avg 1.56
etc etc (until number 10)
It doesn't have to be in complete alphabetical and numerical order, it can just print out the first combos it finds.. But I've done this so far in my code, and for some reason it doesn't print out ALL the combos, only 3.
import xlrd #import package
#opening workbook and reading first sheet
book = xlrd.open_workbook('trend.xls')
sheet = book.sheet_by_index(0)
#function to hold unique combos
unique_combinations = {}
#looping through data
for row_index in range(sheet.nrows):
#declaring what group equals to what row
col1 = sheet.cell(row_index, 0)
col2 = sheet.cell(row_index, 1)
col3 = sheet.cell(row_index, 2)
unique_combo = (col1.value, col2.value)
if unique_combinations.has_key(unique_combo):
unique_combinations[unique_combo].append(col3.value)
else:
unique_combinations[unique_combo] = [col3.value]
for k in unique_combinations.keys():
l = unique_combinations[k]
average = sum(l) / len(l)
print '%s: %s Mean = %s' % (k[0], k[1], average)
Essentially, it's basically 2 groups, and within the 2 groups are another 10 groups, and within those 10 groups are the average of the numbers that belongs there.
Please help! Thank you so much in advance.
SAMPLE OF EXCEL FILE:
col1 | col2 | col3
A | 1 | 3.12
B | 9 | 4.12
B | 2 | 2.43
A | 1 | 9.54
B | 8 | 2.43
A | 2 | 1.08
So what the program will do is see that the first combo it comes across is A, 1 and it'll store the 3.12 in a list, and look at the next ones and keep storing, until it comes across a duplicate with is the fourth row. And it'll store that value as well. And at the end of it, the output will show A, 1 = avg (3.12 + 9.54 / 2). This example is only showing for the A, 1 combo. But in reality, there are only 2 groups (like the example) but col2 can range from 1 to 10. There will be many duplicates.
forhits theunique_combinations[(u'col1', u'col2']entry, the value of which is[u'col3'], theaveragecalculation will suffer an exception.