I'm trying to normalize this Json structure

Question

I'm finding an error message when I try to normalize a Json structure that follows.

I pasted the JSON structure and the python code to normalize that is giving the ERROR message

GOAL: Normalize all_data into one table so that I'm able to see the information of all tags.

Important variables to keep in order to identify what user the tags belong to

created_at:

updated_at:

id:

name:

tags:

[{'type': 'conversation.list',
  'pages': {'type': 'pages',
   'next': {'page': 3,
    'starting_after': 'WzE3MTU4NDc3NywzXQ=='},
   'page': 2,
   'per_page': 5,
   'total_pages': 9525},
  'total_count': 47622,
  'conversations': [{'type': 'conversation',
    'id': '1384780',
    'created_at': 1715780970,
    'updated_at': 1715782197,
    'waiting_since': None,
    'snoozed_until': None,
    'source': {'type': 'conversation',
     'id': '2197597651',
     'delivered_as': 'customer_initiated',
     'subject': '',
     'body': '<p>Outros</p>',
     'author': {'type': 'user',
      'id': '64ac5cacccd1982047',
      'name': 'Claudinho',
      'email': '[email protected]'},
     'attachments': [],
     'url': None,
     'redacted': False},
    'contacts': {'type': 'contact.list',
     'contacts': [{'type': 'contact',
       'id': '64ac5cabc0271982047',
       'external_id': 'b363b00b5e72e8'}]},
    'first_contact_reply': {'created_at': 1715780970,
     'type': 'conversation',
     'url': None},
    'admin_assignee_id': 5614527,
    'team_assignee_id': 5045796,
    'open': False,
    'state': 'closed',
    'read': True,
    'tags': {'type': 'tag.list',
     'tags': [{'type': 'tag',
       'id': '5379642',
       'name': '[BOT] Other',
       'applied_at': 1715781024,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379660',
       'name': '[BOT] Connected Agent',
       'applied_at': 1715781025,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379654',
       'name': '[BOT] Not Resolved',
       'applied_at': 1715781027,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '7046337',
       'name': '[BOT] Portuguese',
       'applied_at': 1715781029,
       'applied_by': {'type': 'admin', 'id': '4685750'}}]},
    'priority': 'not_priority',
    'sla_applied': None,
    'statistics': {'type': 'conversation_statistics',
     'time_to_assignment': 0,
     'time_to_admin_reply': 189,
     'time_to_first_close': 1158,
     'time_to_last_close': 1228,
     'median_time_to_reply': 139,
     'first_contact_reply_at': 1715780970,
     'first_assignment_at': 1715780970,
     'first_admin_reply_at': 1715781159,
     'first_close_at': 1715782128,
     'last_assignment_at': 1715781159,
     'last_assignment_admin_reply_at': 1715781159,
     'last_contact_reply_at': 1715782179,
     'last_admin_reply_at': 1715782125,
     'last_close_at': 1715782198,
     'last_closed_by_id': 5614527,
     'count_reopens': 1,
     'count_assignments': 3,
     'count_conversation_parts': 28},
    'conversation_rating': None,
    'teammates': {'type': 'admin.list',
     'admins': [{'type': 'admin', 'id': '5614527'}]},
    'title': None,
    'custom_attributes': {'Language': 'Portuguese',
     'Conversation status': 'Open',
     'From': 'iOS / Android'},
    'topics': {'type': 'topic.list', 'topics': [], 'total_count': 0},
    'ticket': None,
    'linked_objects': {'type': 'list',
     'data': [],
     'total_count': 0,
     'has_more': False}},
  

NEXT PAGES

 
 {'type': 'conversation.list',
  'pages': {'type': 'pages',
   'next': {'page': 4,
    'starting_after': 'WzE3MTU3IwMDAs=='},
   'page': 3,
   'per_page': 5,
   'total_pages': 9525},
  'total_count': 47622,
  'conversations': [{'type': 'conversation',
    'id': '1384768',

data_1 = []  # This will store the normalized data
i=0

for i in all_data:  # Iterating directly over items in all_data
    normalized_data = pd.json_normalize(
        all_data[i]["conversations"],
        record_path=["tags", "tags"],
        meta=[
            "id",
            "created_at",
            "updated_at",
            ["source", "id"],
            ["source", "author", "name"],
        ],
        meta_prefix="meta_",  # Avoiding conflicts with available ids 
        errors="ignore",
    )
    
    pd.set_option("display.max_columns", None)
    
    # Append the normalized data to date_1
    data_1.append(normalized_data)
    
    i += 1

# If you want to combine all DataFrames into one:
if data_1:  # Check if date_1 is not empty
    final_data = pd.concat(data_1, ignore_index=True)

ERROR MESSAGE

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[149], line 6
      2 i=0
      4 for i in all_data:  # Iterating directly over items in all_data
      5     normalized_data = pd.json_normalize(
----> 6         all_data[i]["conversations"],
      7         record_path=["tags", "tags"],
      8         meta=[
      9             "id",
     10             "created_at",
     11             "updated_at",
     12             ["source", "id"],
     13             ["source", "author", "name"],
     14         ],
     15         meta_prefix="meta_",  # Avoiding conflicts with available ids 
     16         errors="ignore",
     17     )
     19     pd.set_option("display.max_columns", None)
     21     # Append the normalized data to date_1

TypeError: list indices must be integers or slices, not dict

I'm not sure from your data, but if all_data is a list of dict, then for i in all_data iterates those dict, not an index. You would do i["conversations"] because i is already the dict you want. — tdelaney
– tdelaney, Commented Sep 17, 2024 at 15:20
More generally, aim for a fully functional script we can run. And trim out unneeded data. You don't need 90% of the stuff in your data to demonstrate the problem. Maybe normalize with 2 fields. And drop most of the others as irrelevent. — tdelaney
– tdelaney, Commented Sep 17, 2024 at 15:22
You probably mean for i in range(len(all_data)), and you would not need the i = i + 1 line. If you are not using the numeric index for anything else, than use i["conversations"] instead of all_data[i]["conversations"], as suggested by @tdelaney. — Emilio Silva
– Emilio Silva, Commented Sep 17, 2024 at 15:55

chitown88 · Accepted Answer · 2024-09-18 07:08:31Z

You are all most there. Go over for loops again to get a better understanding of whats going on there. There is no need to store and index value when iterating over a list, and as stated in the comments, i is not a value in the way you think you are doing it. Make these few changes and it should work fine.

for each_data in all_data:  # <- each element in your all_data list is being stored as each_data
    normalized_data = pd.json_normalize(
        each_data["conversations"],    # <- then you want to normalize that here
        record_path=["tags", "tags"],
        meta=[
            "id",
            "created_at",
            "updated_at",
            ["source", "id"],
            ["source", "author", "name"],
        ],
        meta_prefix="meta_",  # Avoiding conflicts with available ids 
        errors="ignore",
    )
    
    pd.set_option("display.max_columns", None)
    
    # Append the normalized data to date_1
    data_1.append(normalized_data)
    
# If you want to combine all DataFrames into one:
if data_1:  # Check if date_1 is not empty
    final_data = pd.concat(data_1, ignore_index=True)

Collectives™ on Stack Overflow

I'm trying to normalize this Json structure

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related