1

I have one json file about ansible inventory where I need to select few columns as dataframe and send email notification.

The following is code I tried:

import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
    d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.head(1)`

Its printing entire records (actually each json will have only one record), but I need to show/select/display only two columns from dataframe. can some one suggest please how to view dataframe with selected columns

Update 1: I am able to generate required columns now,but only certain column working, but when i mention certain columns, then its saying "not in index" And also can i have own column custom header lable while printing ? Working

import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
    d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
df1=mydata[['ansible_architecture','ansible_distribution']]

But when i mention column as hostname,ansible_distribution, its saying not in index. Not working

import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
    d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
df1=mydata[['hostname','ansible_distribution']]

Error: KeyError: "['hostname'] not in index"

Update2:

Now i am able to fix that issue with below, but I need custom label in output, how to do that

`import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
    d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
df1=mydata[['ansible_env.HOSTNAME','ansible_distribution']]`

But i need to have custom columname lable in final output like Host,OSversion for above column, how can i do that?

UPDATE 3: now trying to rename columns name before I print it, tried following code but giving error like key error not in index

import json
import pandas as pd
from tabulate import tabulate
from pandas.io.json import json_normalize
with open('/home/cloud-user/facts.json') as f:
    d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())

mydata=mydata.rename(columns={"ansible_env.HOSTNAME": "HOSTNAME", "ansible_disrribution": "OSType"})
df1=mydata[['HOSTNAME','OSType']]
print(tabulate(df1, headers='keys', tablefmt='psql'))

Traceback (most recent call last):
  File "ab7.py", line 21, in <module>
    df1=mydata[['HOSTNAME','OSType']]
  File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/usr/lib64/python2.7/site-packages/pandas/core/indexing.py", line 1327, in _convert_to_indexer
    .format(mask=objarr[mask]))
KeyError: "['HOSTNAME' 'OSType'] not in index"

But if i dont rename, it working perfectly, But i need most readable column lable. any suggestion please. without rename stuff code get works and output as below on console

+----+------------------------+------------------------+
|    | ansible_env.HOSTNAME   | ansible_distribution   |
|----+------------------------+------------------------|
|  0 | ip-xx-xx-xx-xx         | SLES                   |
+----+------------------------+------------------------+

Now instead anisble_env.HOSTNAME --> i need lable as HOSTNAME , instead of ansible_distribution --> I need OSType any suggestion please

Update 4:

I fixed issue with below

df.rename(columns={'ansible_hostname':'HOSTNAME','ansible_distribution':'OS Version','ansible_ip_addresses':'Private IP','ansible_windows_domain':'FQDN'},inplace=True)
4
  • 2
    mydata[[col1,col2]] Commented May 28, 2019 at 16:57
  • A very comprehensive collection of the basic tasks and operations in pandas can be found here Commented May 28, 2019 at 19:42
  • i tried mydata[[hostname,ansible_distribution]] but its saying hostname not in index error Commented May 29, 2019 at 4:16
  • I have updated original post with additional code that tried, now, certain columns working fine, but some columns saying 'not in index' error Commented May 29, 2019 at 4:35

1 Answer 1

8

Select multiple columns as a DataFrame by passing a list to it:

df[['col_name1', 'col_name2']]

For more information try this link: https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c

Sign up to request clarification or add additional context in comments.

2 Comments

Tried this but not working, Its giving KeyError: "['HOSTNAME'] not in index" i.e its saying column not in index
I have updated original post with additional code that tried, now, certain columns working fine, but some columns saying 'not in index' error

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.