I have one json file about ansible inventory where I need to select few columns as dataframe and send email notification.
The following is code I tried:
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.head(1)`
Its printing entire records (actually each json will have only one record), but I need to show/select/display only two columns from dataframe. can some one suggest please how to view dataframe with selected columns
Update 1: I am able to generate required columns now,but only certain column working, but when i mention certain columns, then its saying "not in index" And also can i have own column custom header lable while printing ? Working
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
df1=mydata[['ansible_architecture','ansible_distribution']]
But when i mention column as hostname,ansible_distribution, its saying not in index. Not working
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
df1=mydata[['hostname','ansible_distribution']]
Error: KeyError: "['hostname'] not in index"
Update2:
Now i am able to fix that issue with below, but I need custom label in output, how to do that
`import json
import pandas as pd
from pandas.io.json import json_normalize
with open('d:/facts.json') as f:
d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
df1=mydata[['ansible_env.HOSTNAME','ansible_distribution']]`
But i need to have custom columname lable in final output like Host,OSversion for above column, how can i do that?
UPDATE 3: now trying to rename columns name before I print it, tried following code but giving error like key error not in index
import json
import pandas as pd
from tabulate import tabulate
from pandas.io.json import json_normalize
with open('/home/cloud-user/facts.json') as f:
d = json.load(f)
mydata = json_normalize(d['ansible_facts'])
mydata.columns = mydata.columns.to_series().apply(lambda x: x.strip())
mydata=mydata.rename(columns={"ansible_env.HOSTNAME": "HOSTNAME", "ansible_disrribution": "OSType"})
df1=mydata[['HOSTNAME','OSType']]
print(tabulate(df1, headers='keys', tablefmt='psql'))
Traceback (most recent call last):
File "ab7.py", line 21, in <module>
df1=mydata[['HOSTNAME','OSType']]
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__
return self._getitem_array(key)
File "/usr/lib64/python2.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "/usr/lib64/python2.7/site-packages/pandas/core/indexing.py", line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
KeyError: "['HOSTNAME' 'OSType'] not in index"
But if i dont rename, it working perfectly, But i need most readable column lable. any suggestion please. without rename stuff code get works and output as below on console
+----+------------------------+------------------------+
| | ansible_env.HOSTNAME | ansible_distribution |
|----+------------------------+------------------------|
| 0 | ip-xx-xx-xx-xx | SLES |
+----+------------------------+------------------------+
Now instead anisble_env.HOSTNAME --> i need lable as HOSTNAME , instead of ansible_distribution --> I need OSType any suggestion please
Update 4:
I fixed issue with below
df.rename(columns={'ansible_hostname':'HOSTNAME','ansible_distribution':'OS Version','ansible_ip_addresses':'Private IP','ansible_windows_domain':'FQDN'},inplace=True)
mydata[[col1,col2]]mydata[[hostname,ansible_distribution]]but its saying hostname not in index error