Appending a csv with dictionary values using pandas python

Question

My python script produces a dictionary as follows:

================================================================

TL&DR

I overcomplicated the problem by using from_dict method, while creating a dataframe from dictionary. Thanks to @Sword.

In other words, pd.DataFrame.from_dict is only needed if you want to create a dataframe with all keys in one column, all values in another column. In all other cases, it is as simple as the approach mentioned in the accepted answer.

==============================================================

{u'19:00': 2, u'12:00': 1, u'06:00': 2, u'00:00': 0, u'23:00': 2, u'05:00': 2, u'11:00': 4, u'14:00': 2, u'04:00': 0, u'09:00': 7, u'03:00': 1, u'18:00': 6, u'01:00': 0, u'21:00': 5, u'15:00': 8, u'22:00': 1, u'08:00': 5, u'16:00': 8, u'02:00': 0, u'13:00': 8, u'20:00': 5, u'07:00': 11, u'17:00': 12, u'10:00': 8}

and it also produces a variable, let's say full_name (taken as an argument to the script) which has the value "John".

Everytime I run the script, it gives me a dictionary and name in the aforementioned format.

I want to write this into a csv file for later analysis in the following format:

FULLNAME | 00:00  |  01:00  |  02:00  | .....| 22:00  |  23:00  |
John     | 0      |  0      |  0      | .....| 1      |  2      |

My code to produce that is as follows:

import collections
import pandas as pd

# ........................
# Other part of code, which produces the dictionary by name "data_dict"
# ........................

#Sorting the dictionary (And adding it to a ordereddict) in order to skip matching dictionary keys with column headers
data_dict_sorted = collections.OrderedDict(sorted(data_dict.items()))

# For the first time to produce column headers, I used .items() and rest of the following lines follows it.
# df = pd.DataFrame.from_dict(data_dict_sorted.items())

#For the second time onwards, I just need to append the values, I am using .values()
df = pd.DataFrame.from_dict(data_dict_sorted.values())

df2 = df.T # transposing because from_dict creates all keys in one column, and corresponding values in the next column.
df2.columns = df2.iloc[0] 
df3 = df2[1:]
df3["FULLNAME"] = args.name #This is how we add a value, isn't it?
df3.to_csv('test.csv', mode = 'a', sep=str('\t'), encoding='utf-8', index=False)

My code is producing the following csv

00:00 | 01:00 | 02:00 | …….. | 22:00 | 23:00 | FULLNAME
0     | 0     | 0     | …….. | 1     | 2     | John
0     | 0     | 0     | …….. | 1     | 2     | FULLNAME
0     | 0     | 0     | …….. | 1     | 2     | FULLNAME

My question is two fold:

Why is it printing "FULLNAME" instead of "John" in the second iteration (as in the second time the script is run)? What am I missing?
is there a better way to do this?

Hypothetical Ninja · Accepted Answer · 2017-05-03 13:18:13Z

1

How about this?

df = pd.DataFrame(data_dict, index=[0])
df['FullName'] = 'John'

EDIT:
It is a bit difficult to understand the way you are conducting the operations but it looks like the issue is with the line df.columns = df.iloc[0] . The above code I've mentioned will not need the assignment of column names or the transpose operation. If you are adding a dictionary at each iteration, try:

data_dict['FullName'] = 'John'
df = df.append(pd.DataFrame(data_dict, index =[0]), ignore_index = True).reset_index()

If each row might have a different name, then df['FullName'] = 'John' will cause the entire column to equate to John. Hence as a better step, create a key called 'FullName' in your dict with the appropriate name as its value to avoid assigning a uniform value to the entire column i.e

data_dict['FullName'] = 'John'

edited May 3, 2017 at 13:18

answered May 3, 2017 at 10:03

Hypothetical Ninja

4,07714 gold badges52 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

kingmakerking Over a year ago

What does index = [0] does?

Hypothetical Ninja Over a year ago

whenever you pass a dictionary to pd.DataFrame, the values for each key need to be in a list format. But in your case the values are integers and scalars need can only be passed if you provide info about the index. index=[0] simply means index of row is 0. For multiple rows, this should be a list of indices which can be labels or numericals.

kingmakerking Over a year ago

But I don't think that solves the issue I am facing here.

Hypothetical Ninja Over a year ago

how did you get the 2nd and 3rd rows? I have edited the answer assuming you add a single dictionary everytime to the existing df.

kingmakerking Over a year ago

Like it is mentioned in the comments, I first run df = pd.DataFrame.from_dict(data_dict_sorted.items()) which gives me the column headers as the time slots (keys of the dictionary), and then the values. Second time the script runs (that is what I mean by iterate), I replace this line with df = pd.DataFrame.from_dict(data_dict_sorted.values()), so that only values get appended, and not keys. The only problem is in the column "FULLNAME", I get the value "FULLNAME" instead of "John" when the script is run for the second time.

|

Collectives™ on Stack Overflow

Appending a csv with dictionary values using pandas python

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related