0

My python script produces a dictionary as follows:

================================================================

TL&DR

I overcomplicated the problem by using from_dict method, while creating a dataframe from dictionary. Thanks to @Sword.

In other words, pd.DataFrame.from_dict is only needed if you want to create a dataframe with all keys in one column, all values in another column. In all other cases, it is as simple as the approach mentioned in the accepted answer.

==============================================================

{u'19:00': 2, u'12:00': 1, u'06:00': 2, u'00:00': 0, u'23:00': 2, u'05:00': 2, u'11:00': 4, u'14:00': 2, u'04:00': 0, u'09:00': 7, u'03:00': 1, u'18:00': 6, u'01:00': 0, u'21:00': 5, u'15:00': 8, u'22:00': 1, u'08:00': 5, u'16:00': 8, u'02:00': 0, u'13:00': 8, u'20:00': 5, u'07:00': 11, u'17:00': 12, u'10:00': 8}

and it also produces a variable, let's say full_name (taken as an argument to the script) which has the value "John".

Everytime I run the script, it gives me a dictionary and name in the aforementioned format.

I want to write this into a csv file for later analysis in the following format:

FULLNAME | 00:00  |  01:00  |  02:00  | .....| 22:00  |  23:00  |
John     | 0      |  0      |  0      | .....| 1      |  2      |

My code to produce that is as follows:

import collections
import pandas as pd

# ........................
# Other part of code, which produces the dictionary by name "data_dict"
# ........................

#Sorting the dictionary (And adding it to a ordereddict) in order to skip matching dictionary keys with column headers
data_dict_sorted = collections.OrderedDict(sorted(data_dict.items()))

# For the first time to produce column headers, I used .items() and rest of the following lines follows it.
# df = pd.DataFrame.from_dict(data_dict_sorted.items())

#For the second time onwards, I just need to append the values, I am using .values()
df = pd.DataFrame.from_dict(data_dict_sorted.values())

df2 = df.T # transposing because from_dict creates all keys in one column, and corresponding values in the next column.
df2.columns = df2.iloc[0] 
df3 = df2[1:]
df3["FULLNAME"] = args.name #This is how we add a value, isn't it?
df3.to_csv('test.csv', mode = 'a', sep=str('\t'), encoding='utf-8', index=False)

My code is producing the following csv

00:00 | 01:00 | 02:00 | …….. | 22:00 | 23:00 | FULLNAME
0     | 0     | 0     | …….. | 1     | 2     | John
0     | 0     | 0     | …….. | 1     | 2     | FULLNAME
0     | 0     | 0     | …….. | 1     | 2     | FULLNAME

My question is two fold:

  1. Why is it printing "FULLNAME" instead of "John" in the second iteration (as in the second time the script is run)? What am I missing?
  2. is there a better way to do this?

1 Answer 1

1

How about this?

df = pd.DataFrame(data_dict, index=[0])
df['FullName'] = 'John'

EDIT:
It is a bit difficult to understand the way you are conducting the operations but it looks like the issue is with the line df.columns = df.iloc[0] . The above code I've mentioned will not need the assignment of column names or the transpose operation. If you are adding a dictionary at each iteration, try:

data_dict['FullName'] = 'John'
df = df.append(pd.DataFrame(data_dict, index =[0]), ignore_index = True).reset_index()

If each row might have a different name, then df['FullName'] = 'John' will cause the entire column to equate to John. Hence as a better step, create a key called 'FullName' in your dict with the appropriate name as its value to avoid assigning a uniform value to the entire column i.e

data_dict['FullName'] = 'John'
Sign up to request clarification or add additional context in comments.

7 Comments

What does index = [0] does?
whenever you pass a dictionary to pd.DataFrame, the values for each key need to be in a list format. But in your case the values are integers and scalars need can only be passed if you provide info about the index. index=[0] simply means index of row is 0. For multiple rows, this should be a list of indices which can be labels or numericals.
But I don't think that solves the issue I am facing here.
how did you get the 2nd and 3rd rows? I have edited the answer assuming you add a single dictionary everytime to the existing df.
Like it is mentioned in the comments, I first run df = pd.DataFrame.from_dict(data_dict_sorted.items()) which gives me the column headers as the time slots (keys of the dictionary), and then the values. Second time the script runs (that is what I mean by iterate), I replace this line with df = pd.DataFrame.from_dict(data_dict_sorted.values()), so that only values get appended, and not keys. The only problem is in the column "FULLNAME", I get the value "FULLNAME" instead of "John" when the script is run for the second time.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.