-3

I'm using the following code in python to flatten the json structure below, however it doesn't work for all levels. I'm interested in the tags.tags column data specifically shown on the picture below

pd.json_normalize(data['conversations'])
pd.set_option('display.max_columns', None)

Json structure:

[{'type': 'conversation.list',
  'pages': {'type': 'pages',
   'next': {'page': 3,
    'starting_after': 'WzE3MTU3ODIxOTc=='},
   'page': 2,
   'per_page': 1,
   'total_pages': 46969},
  'total_count': 46969,
  'conversations': [{'type': 'conversation',
    'id': '1384780',
    'created_at': 1715780970,
    'updated_at': 1715782197,
    'waiting_since': None,
    'snoozed_until': None,
    'source': {'type': 'conversation',
     'id': '2197597651',
     'delivered_as': 'customer_initiated',
     'subject': '',
     'body': '<p>Outros</p>',
     'author': {'type': 'user',
      'id': '64ac5cacccd2b047',
      'name': 'Claudiney Pinho',
      'email': '[email protected]'},
     'attachments': [],
     'url': None,
     'redacted': False},
    'contacts': {'type': 'contact.list',
     'contacts': [{'type': 'contact',
       'id': '64accccd71982047',
       'external_id': 'b363-cc8f--fb270b5e72e8'}]},
    'first_contact_reply': {'created_at': 1715780970,
     'type': 'conversation',
     'url': None},
    'admin_assignee_id': 5614527,
    'team_assignee_id': 5045796,
    'open': False,
    'state': 'closed',
    'read': True,
    'tags': {'type': 'tag.list',
     'tags': [{'type': 'tag',
       'id': '5379642',
       'name': '[BOT] Other',
       'applied_at': 1715781024,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379660',
       'name': '[BOT] Connected Agent',
       'applied_at': 1715781025,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379654',
       'name': '[BOT] Not Resolved',
       'applied_at': 1715781027,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '7046337',
       'name': '[BOT] Portuguese',
       'applied_at': 1715781029,
       'applied_by': {'type': 'admin', 'id': '4685750'}}]},
    'priority': 'not_priority',
    'sla_applied': None,
    'statistics': {'type': 'conversation_statistics',
     'time_to_assignment': 0,
     'time_to_admin_reply': 189,
     'time_to_first_close': 1158,
     'time_to_last_close': 1228,
     'median_time_to_reply': 139,
     'first_contact_reply_at': 1715780970,
     'first_assignment_at': 1715780970,
     'first_admin_reply_at': 1715781159,
     'first_close_at': 1715782128,
     'last_assignment_at': 1715781159,
     'last_assignment_admin_reply_at': 1715781159,
     'last_contact_reply_at': 1715782179,
     'last_admin_reply_at': 1715782125,
     'last_close_at': 1715782198,
     'last_closed_by_id': 5614527,
     'count_reopens': 1,
     'count_assignments': 3,
     'count_conversation_parts': 28},
    'conversation_rating': None,
    'teammates': {'type': 'admin.list',
     'admins': [{'type': 'admin', 'id': '5614527'}]},
    'title': None,
    'custom_attributes': {'Language': 'Portuguese',
     'Conversation status': 'Open',
     'From': 'iOS / Android'},
    'topics': {'type': 'topic.list', 'topics': [], 'total_count': 0},
    'ticket': None,
    'linked_objects': {'type': 'list',
     'data': [],
     'total_count': 0,
     'has_more': False}}]}]

how the dataset is displayed

I wish to create a pandas data frame with flatten key:values in a table

5
  • 1
    Add a few sample rows of your expected data frame. Commented Sep 13, 2024 at 15:09
  • Your JSON data is invalid, has several unquoted strings like None, True, False. Check validity first, maybe it's just that. Commented Sep 13, 2024 at 15:27
  • 1
    @PepeNO This is JSON-compatible Python data, not JSON itself. Commented Sep 13, 2024 at 16:41
  • 1
    Welcome to Stack Overflow! Please take the tour. SO is a site for specific questions, but there's no question here and the problem is a little vague. Please read How to Ask and make a minimal reproducible example including desired output dataframe. See also How to make good reproducible pandas examples. If you could reduce the size of the input data, that'd be great. Commented Sep 13, 2024 at 16:47
  • 1
    Also, please don't post pictures of text. Instead, copy the text itself, edit it into your post, and use the formatting tools like code formatting (print(df)) or table formatting (print(df.to_markdown()). Commented Sep 13, 2024 at 16:48

2 Answers 2

0

You can try using pd.json_normalize function with the record_path and meta parameters. This should flatten the JSON to different multiple levels of your choice, you can then further extract each nested lists or dictionaries into columns, which is of interest to you.

I have provided a code snippet for you, I am not using jupyter so you might need to test it but in theory this should work nicely, since the concept is true.

import pandas as pd

df = pd.json_normalize(
    data[0]["conversations"],
    record_path=["tags", "tags"],
    meta=[
        "id",
        "created_at",
        "updated_at",
        ["source", "id"],
        ["source", "author", "name"],
    ],
    meta_prefix="meta_",  # trying to avoid conflicts with available ids 
    errors="ignore",
)
pd.set_option("display.max_columns", None)

Also note you have many NaN which caused errors, also I had to use meta_prefix for id.

  applied_by.id  meta_id meta_created_at meta_updated_at meta_source.id  \
0       4685750  1384780      1715780970      1715782197            NaN   
1       4685750  1384780      1715780970      1715782197            NaN   
2       4685750  1384780      1715780970      1715782197            NaN   
3       4685750  1384780      1715780970      1715782197            NaN   

  meta_source.author.name  
0                     NaN  
1                     NaN  
2                     NaN  
3                     NaN  
Sign up to request clarification or add additional context in comments.

Comments

0

D.lola, it works however I tried to loop over the indices and append the results to other variable. I came up with the error: "list indices must be integers or slices, not dict"

It works if I run 'all_data[0]["conversations"]' without 'i += 1'

    data_1 = []  # This will store the normalized data
    i=0
    
    for i in all_data:  # Iterating directly over items in all_data
        normalized_data = pd.json_normalize(
            all_data[i]["conversations"],
            record_path=["tags", "tags"],
            meta=[
                "id",
                "created_at",
                "updated_at",
                ["source", "id"],
                ["source", "author", "name"],
            ],
            meta_prefix="meta_",  # Avoiding conflicts with available ids 
            errors="ignore",
        )
        
        pd.set_option("display.max_columns", None)
        
        # Append the normalized data to date_1
        data_1.append(normalized_data)
        
        i += 1
    
    # If you want to combine all DataFrames into one:
    if data_1:  # Check if date_1 is not empty
        final_data = pd.concat(data_1, ignore_index=True)

2 Comments

Hi Lucas, i becomes each element in all_data so it's a dict, not an index. does i contain each item you want in all_data? I am guessing you want to normalise i as well. You can try replacing all_data[i] with just i let's see what happens and also no need for this i += 1 at the end. I hope it helps
It worked. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.