0

I'm finding an error message when I try to normalize a Json structure that follows.

I pasted the JSON structure and the python code to normalize that is giving the ERROR message

GOAL: Normalize all_data into one table so that I'm able to see the information of all tags.

Important variables to keep in order to identify what user the tags belong to

created_at:

updated_at:

id:

name:

tags:

[{'type': 'conversation.list',
  'pages': {'type': 'pages',
   'next': {'page': 3,
    'starting_after': 'WzE3MTU4NDc3NywzXQ=='},
   'page': 2,
   'per_page': 5,
   'total_pages': 9525},
  'total_count': 47622,
  'conversations': [{'type': 'conversation',
    'id': '1384780',
    'created_at': 1715780970,
    'updated_at': 1715782197,
    'waiting_since': None,
    'snoozed_until': None,
    'source': {'type': 'conversation',
     'id': '2197597651',
     'delivered_as': 'customer_initiated',
     'subject': '',
     'body': '<p>Outros</p>',
     'author': {'type': 'user',
      'id': '64ac5cacccd1982047',
      'name': 'Claudinho',
      'email': '[email protected]'},
     'attachments': [],
     'url': None,
     'redacted': False},
    'contacts': {'type': 'contact.list',
     'contacts': [{'type': 'contact',
       'id': '64ac5cabc0271982047',
       'external_id': 'b363b00b5e72e8'}]},
    'first_contact_reply': {'created_at': 1715780970,
     'type': 'conversation',
     'url': None},
    'admin_assignee_id': 5614527,
    'team_assignee_id': 5045796,
    'open': False,
    'state': 'closed',
    'read': True,
    'tags': {'type': 'tag.list',
     'tags': [{'type': 'tag',
       'id': '5379642',
       'name': '[BOT] Other',
       'applied_at': 1715781024,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379660',
       'name': '[BOT] Connected Agent',
       'applied_at': 1715781025,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '5379654',
       'name': '[BOT] Not Resolved',
       'applied_at': 1715781027,
       'applied_by': {'type': 'admin', 'id': '4685750'}},
      {'type': 'tag',
       'id': '7046337',
       'name': '[BOT] Portuguese',
       'applied_at': 1715781029,
       'applied_by': {'type': 'admin', 'id': '4685750'}}]},
    'priority': 'not_priority',
    'sla_applied': None,
    'statistics': {'type': 'conversation_statistics',
     'time_to_assignment': 0,
     'time_to_admin_reply': 189,
     'time_to_first_close': 1158,
     'time_to_last_close': 1228,
     'median_time_to_reply': 139,
     'first_contact_reply_at': 1715780970,
     'first_assignment_at': 1715780970,
     'first_admin_reply_at': 1715781159,
     'first_close_at': 1715782128,
     'last_assignment_at': 1715781159,
     'last_assignment_admin_reply_at': 1715781159,
     'last_contact_reply_at': 1715782179,
     'last_admin_reply_at': 1715782125,
     'last_close_at': 1715782198,
     'last_closed_by_id': 5614527,
     'count_reopens': 1,
     'count_assignments': 3,
     'count_conversation_parts': 28},
    'conversation_rating': None,
    'teammates': {'type': 'admin.list',
     'admins': [{'type': 'admin', 'id': '5614527'}]},
    'title': None,
    'custom_attributes': {'Language': 'Portuguese',
     'Conversation status': 'Open',
     'From': 'iOS / Android'},
    'topics': {'type': 'topic.list', 'topics': [], 'total_count': 0},
    'ticket': None,
    'linked_objects': {'type': 'list',
     'data': [],
     'total_count': 0,
     'has_more': False}},
  

NEXT PAGES

 
 {'type': 'conversation.list',
  'pages': {'type': 'pages',
   'next': {'page': 4,
    'starting_after': 'WzE3MTU3IwMDAs=='},
   'page': 3,
   'per_page': 5,
   'total_pages': 9525},
  'total_count': 47622,
  'conversations': [{'type': 'conversation',
    'id': '1384768',
data_1 = []  # This will store the normalized data
i=0

for i in all_data:  # Iterating directly over items in all_data
    normalized_data = pd.json_normalize(
        all_data[i]["conversations"],
        record_path=["tags", "tags"],
        meta=[
            "id",
            "created_at",
            "updated_at",
            ["source", "id"],
            ["source", "author", "name"],
        ],
        meta_prefix="meta_",  # Avoiding conflicts with available ids 
        errors="ignore",
    )
    
    pd.set_option("display.max_columns", None)
    
    # Append the normalized data to date_1
    data_1.append(normalized_data)
    
    i += 1

# If you want to combine all DataFrames into one:
if data_1:  # Check if date_1 is not empty
    final_data = pd.concat(data_1, ignore_index=True)

ERROR MESSAGE

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[149], line 6
      2 i=0
      4 for i in all_data:  # Iterating directly over items in all_data
      5     normalized_data = pd.json_normalize(
----> 6         all_data[i]["conversations"],
      7         record_path=["tags", "tags"],
      8         meta=[
      9             "id",
     10             "created_at",
     11             "updated_at",
     12             ["source", "id"],
     13             ["source", "author", "name"],
     14         ],
     15         meta_prefix="meta_",  # Avoiding conflicts with available ids 
     16         errors="ignore",
     17     )
     19     pd.set_option("display.max_columns", None)
     21     # Append the normalized data to date_1

TypeError: list indices must be integers or slices, not dict
5
  • I'm not sure from your data, but if all_data is a list of dict, then for i in all_data iterates those dict, not an index. You would do i["conversations"] because i is already the dict you want. Commented Sep 17, 2024 at 15:20
  • More generally, aim for a fully functional script we can run. And trim out unneeded data. You don't need 90% of the stuff in your data to demonstrate the problem. Maybe normalize with 2 fields. And drop most of the others as irrelevent. Commented Sep 17, 2024 at 15:22
  • You probably mean for i in range(len(all_data)), and you would not need the i = i + 1 line. If you are not using the numeric index for anything else, than use i["conversations"] instead of all_data[i]["conversations"], as suggested by @tdelaney. Commented Sep 17, 2024 at 15:55
  • The error message is telling you that i is a dict Commented Sep 17, 2024 at 17:05
  • I solved the issue with i["conversations"]. Thank you all! Commented Sep 17, 2024 at 19:23

1 Answer 1

0

You are all most there. Go over for loops again to get a better understanding of whats going on there. There is no need to store and index value when iterating over a list, and as stated in the comments, i is not a value in the way you think you are doing it. Make these few changes and it should work fine.

for each_data in all_data:  # <- each element in your all_data list is being stored as each_data
    normalized_data = pd.json_normalize(
        each_data["conversations"],    # <- then you want to normalize that here
        record_path=["tags", "tags"],
        meta=[
            "id",
            "created_at",
            "updated_at",
            ["source", "id"],
            ["source", "author", "name"],
        ],
        meta_prefix="meta_",  # Avoiding conflicts with available ids 
        errors="ignore",
    )
    
    pd.set_option("display.max_columns", None)
    
    # Append the normalized data to date_1
    data_1.append(normalized_data)
    
# If you want to combine all DataFrames into one:
if data_1:  # Check if date_1 is not empty
    final_data = pd.concat(data_1, ignore_index=True)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.