Append data to CSV using a nested loop

Question

I am trying to append data from the list json_responsecontaining Twitter data to a CSV file using the function append_to_csv.

I understand the structure of the json_response. It contains data on users who follow two politicians; 5 and 13 users respectively. 1) author_id, created_at, tweet_id and text is in data. 2) description/bio is in ['includes']['users']. 3) url/image_url is in ['includes']['media']. However my nested loop does not append any data to sample_data.csv? and it throws no error. Does it have something to do with my identation?

print(json.dumps(json_response, indent=4, sort_keys=True))  # look at json_response object.
[
    {
        "data": [
            {
                "author_id": "2877379617",
                "created_at": "2021-03-25T12:11:14.000Z",
                "id": "1375057688355336195",
                "text": "@prettynobodyco She blocked me in 2015 - for pointing out that Tim Kaine enables sexual assault in the military and the evidence was his killing of the MJIA and publicly stated that Military commanders should remain in charge of military rape cases. She's Tanden level awful. Congrats!"
            },
            {
                "author_id": "1265018154444562440",
                "created_at": "2021-03-22T19:48:59.000Z",
                "id": "1374085719472361474",
                "text": "@MehcatCat @AlasscanIsBack @PattyArquette @timkaine Funny, they blocked me. \ud83e\udd23\ud83e\udd23"
            },
            {
                "author_id": "2378324935",
                "created_at": "2021-03-07T21:32:13.000Z",
                "id": "1368675879312887810",
                "text": "@DrWinarick @KatieOGrady4 I apologize for any drama. Katie O Grady blocked me because we had a disagreement about Tim Kaine on one of your older posts. I guess I can't please everyone haha. :/"
            },
            {
                "author_id": "821870502943817729",
                "created_at": "2021-02-12T23:53:59.000Z",
                "id": "1360376637385244673",
                "text": "She blocked me a long ass time ago when I asked her why we shoulf care about Tim Kaine's personal view on abortion if it didn't impact legislation"
            },
            {
                "attachments": {
                    "media_keys": [
                        "16_1341045032732770306"
                    ]
                },
                "author_id": "17232340",
                "created_at": "2020-12-21T15:37:07.000Z",
                "id": "1341045038420275205",
                "text": "@DSingh4Biden @moomintroll8 @timkaine @GovernorVA That's why I replied to you. She blocked me previously, for what silliness I can't remember. Tough being a troll AND a snowflake!"
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "16_1341045032732770306",
                    "type": "animated_gif"
                }
            ],
            "users": [
                {
                    "created_at": "2014-11-15T02:23:57.000Z",
                    "description": "",
                    "id": "2877379617",
                    "name": "Laura Saylor",
                    "username": "lauraleesaylor"
                },
                {
                    "created_at": "2020-05-25T20:33:36.000Z",
                    "description": "Weird Writer & Lunatic Linguist\nWicked Witch of the East\nshe/her",
                    "id": "1265018154444562440",
                    "name": "Zauberkind",
                    "username": "Zauberkind2"
                },
                {
                    "created_at": "2014-03-08T07:22:31.000Z",
                    "description": "#Resist, #BLM, #Vaxxed, liberal, autistic, kidney transplant survivor, political nerd, mental health advocate, fighter for equality, truth, justice, etc.",
                    "id": "2378324935",
                    "name": "Trevor \"Trev\" McKee Achilles",
                    "username": "MrTAchilles"
                },
                {
                    "created_at": "2017-01-19T00:02:52.000Z",
                    "description": "statist /  Progressive Gun Nut/ Single and hating it\n\n / \n\nstraight????? /\n\npronouns / brain worm survivor\n\n",
                    "id": "821870502943817729",
                    "name": "Puppet Enthusiast",
                    "username": "nihilisticpillo"
                },
                {
                    "created_at": "2008-11-07T15:09:46.000Z",
                    "description": "Liberal-Veteran-Dog Lover | Taste for irony, but in moderation | Humor is reason gone mad. ~Groucho Marx | I follow & unfollow back #VeteransResist #Resist",
                    "id": "17232340",
                    "name": "anti-Fascist Jim",
                    "username": "JimnBL"
                }
            ]
        },
        "meta": {
            "newest_id": "1375057688355336195",
            "next_token": "b26v89c19zqg8o3fos5vyedr54ngvtx3nuqvnx6pglrb1",
            "oldest_id": "1341045038420275205",
            "result_count": 5
        }
    },
    {
        "data": [
            {
                "author_id": "737885223858384896",
                "created_at": "2021-03-26T21:56:02.000Z",
                "id": "1375567243082338314",
                "text": "@hogan_1969 @LindseyGrahamSC LOL She Blocked me.. could not admit the truth could she now. okay so where is her source for the shirts? and that is what he said. I (quote) We immediately surge the border all those seeking asylum. What about his lie about the cages? no Answer lol."
            },
            {
                "author_id": "847612931487416323",
                "created_at": "2021-03-26T21:55:24.000Z",
                "id": "1375567083791073283",
                "text": "@hogan_1969 @TeichTerry @thehill @LindseyGrahamSC @hogan_1969 just blocked me for showing her the actual numbers \ud83e\udd23\n\n#LiberalsHateFacts"
            },
            {
                "author_id": "18634205",
                "created_at": "2021-03-08T12:29:00.000Z",
                "id": "1368901564363051010",
                "text": "Huh.  Made me think if @LeaderMcConnell @LindseyGrahamSC @marcorubio @SenTedCruz feel trapped under the thumb of Trumpy.  And who else? @IvankaTrump? @MELANIATRUMP ? @DonaldJTrumpJr ? I\u2019d say Eric, but he blocked me."
            },
            {
                "author_id": "27327319",
                "created_at": "2021-03-02T11:53:16.000Z",
                "id": "1366718245521211393",
                "text": "@fedupinNHtoo @LindseyGrahamSC Exactly. I asked that question of a Republican on Facebook last night and she blocked me"
            },
            {
                "author_id": "917634626247647232",
                "created_at": "2021-02-28T18:16:45.000Z",
                "id": "1366089974907432961",
                "text": "@gop this is for you! @tedcruz @LindseyGrahamSC @MittRomney @mikepompeo\n#BitchyMcC blocked me!\ud83d\udc4d\nWatch \"Jack Off Jill - Hypocrite + lyrics\" on YouTube"
            },
            {
                "author_id": "1231059979844456448",
                "created_at": "2021-02-26T04:25:49.000Z",
                "id": "1365156089554067459",
                "text": "@KelleyALynch1 @marwilliamson @therecount @LindseyGrahamSC She's fine with that just as she's fine with Biden's Nazis in Ukraine. She wants war with Russia, too. She blocked me for this tweet because she couldn't even condemn Biden's Nazis in Ukraine. She's a fauxgressive warmonger, a wolf in sheep's clothing. \n"
            },
            {
                "author_id": "1315477593303310336",
                "created_at": "2021-02-23T00:00:41.000Z",
                "id": "1364002202843451399",
                "text": "@MistyKitty3 @BlairMurray83 @FrankAmari2 @LindseyGrahamSC \ud83e\udd23 Someone didn\u2019t like what I said and blocked me."
            },
            {
                "author_id": "1069115263671562240",
                "created_at": "2021-02-22T04:36:06.000Z",
                "id": "1363709124891070467",
                "text": "@trinkity88 @LindseyGrahamSC Apparently, @Trinkitty88 blocked me because FACTS are TOO HARD to handle!\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83e\udd23"
            },
            {
                "author_id": "1303321972227690496",
                "created_at": "2021-02-20T19:38:49.000Z",
                "id": "1363211526316969985",
                "text": "@horsin64 @GovMurphy @LindseyGrahamSC You blocked me because you\u2019re a nifkin. It\u2019s not cyber tough you Nancy I\u2019d say it to your face. American lives matter before anyone else. America first and you don\u2019t like it because you have trump derangement. You\u2019re a psycho"
            },
            {
                "author_id": "27943005",
                "created_at": "2021-02-19T20:00:38.000Z",
                "id": "1362854626924650497",
                "text": "@TonyRom31334975 @staceyabrams @AnnaForFlorida @LindseyGrahamSC The guy blocked me on Twitter and had to unblock me after the Knight First Amendment Institute sued him and won&gt; I am certain It won't talk to me, but imagine..hehe?!"
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1361344652264280068"
                    ]
                },
                "author_id": "1126249378279297027",
                "created_at": "2021-02-15T16:00:32.000Z",
                "id": "1361344654395011079",
                "text": "@Jamie1074 @Breaking911 You know what\n\nIt's funny that they blocked me because I actually did agree with them on Lindsey Graham...\n\nCome on, man !"
            },
            {
                "author_id": "1207432044390699008",
                "created_at": "2021-02-14T07:58:21.000Z",
                "id": "1360860918687559681",
                "text": "@LindseyGrahamSC I really don't know why you haven't blocked me yet. Pile of human shit. I just read a letter that John McCain wrote me and for some reason it made me think about you and what he would think about your behavior. I guarantee you'd be in for an ass whippin'. Dick."
            },
            {
                "author_id": "926909484",
                "created_at": "2021-02-13T20:53:03.000Z",
                "id": "1360693490880032770",
                "text": "@LadyReverbs @themariefonseca @styvanswift @LindseyGrahamSC Lady, you might be able to see Marie\u2019s tweets. She blocked me. She may call this a victory for Trump. The reality is that seven members of the @GOP voted to convict. They are the true patriots of the Republican Party."
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "3_1361344652264280068",
                    "type": "photo",
                    "url": ""
                }
            ],
            "users": [
                {
                    "created_at": "2016-06-01T05:55:21.000Z",
                    "description": "Biden Inflation the worst in 30 years. His Handlers trying to Rebrand Brandon is Hilarious.",
                    "id": "737885223858384896",
                    "name": "Biden is a complete mess and you know it.",
                    "username": "zelda3024"
                },
                {
                    "created_at": "2017-03-31T00:54:05.000Z",
                    "description": "Love God, Love Family, Love Country, Love Freedom - if we put those things first everything else will be great. MAGA",
                    "id": "847612931487416323",
                    "name": "Joey Bagadonuts",
                    "username": "AmericanGr8ness"
                },
                {
                    "created_at": "2009-01-05T15:25:55.000Z",
                    "description": "small & local garlic farmer; independent American; old surfer dude; working to find and speak truth to power; \ud83c\uddfa\ud83c\uddf8; mahalo and Maluhia",
                    "id": "18634205",
                    "name": "MacGregorGarlic",
                    "username": "MacGregorGarlic"
                },
                {
                    "created_at": "2009-03-28T22:53:28.000Z",
                    "description": "Let's Go Darwin!",
                    "id": "27327319",
                    "name": "Karen Kennedy",
                    "username": "KayKay68"
                },
                {
                    "created_at": "2017-10-10T06:15:18.000Z",
                    "description": "Mom\ud83d\udc95Cannactivist\ud83c\udf3fSecularHumanist\ud83c\udf10 BLM\u270a\ud83c\udfff\ud83c\udf08Ally\ud83e\udd8bCPTSD\u2695\ufe0f FTD\ud83e\udd14MeToo\ud83c\udf38ProChoice\ud83d\udc93CRPS\ud83d\ude23ClimateChange\ud83c\udf0e DACA\ud83c\uddfa\ud83c\uddf2AdoptDontShop\ud83d\udc3e#Steelers \ud83d\udda4\ud83d\udc9b #Vaxxed2TheMax\u270a\ud83d\udc9a",
                    "id": "917634626247647232",
                    "name": "Raven The Hemptress #LegalizeGlobally\ud83d\udc9a\ud83c\udf3f\u267f",
                    "username": "Kraven_Raven24"
                },
                {
                    "created_at": "2020-02-22T03:35:56.000Z",
                    "description": "Monetarism is the underlying cause of our disease; human progress and peace through development is the cure. Eurasian integration will benefit all of humanity!",
                    "id": "1231059979844456448",
                    "name": "\ud83c\udd70pocalypsis \ud83c\udd70pocalypseos \u2014 BRI Is The Future",
                    "username": "apocalypseos"
                },
                {
                    "created_at": "2020-10-12T02:21:21.000Z",
                    "description": "Father of two beautiful boys. Believer in the Constitution of the United States. Protector of my own rights. #Meatatarian",
                    "id": "1315477593303310336",
                    "name": "\ud83e\udd85 Steven Duggin \u2665\ufe0f \ud83c\uddfa\ud83c\uddf8\ud83d\uddfd",
                    "username": "itsStevenDuggin"
                },
                {
                    "created_at": "2018-12-02T06:25:16.000Z",
                    "description": "",
                    "id": "1069115263671562240",
                    "name": "Barhag",
                    "username": "TheBarhag"
                },
                {
                    "created_at": "2020-09-08T13:19:17.000Z",
                    "description": "Not the liberals cup of tea",
                    "id": "1303321972227690496",
                    "name": "Christy",
                    "username": "Christy54177764"
                },
                {
                    "created_at": "2009-03-31T19:34:24.000Z",
                    "description": "NY-grown, FL-tanned, scribe, word nerd, TV junkie, game show champ, yenta, wife, twin mama, hot sauce collector, Bloody Mary maven &, says @NYPost, savvy gadfly",
                    "id": "27943005",
                    "name": "Lesley Abravanel",
                    "username": "lesleyabravanel"
                },
                {
                    "created_at": "2019-05-08T22:15:51.000Z",
                    "description": "\u2600\ufe0f I post Yuuko Aioi pictures daily \u2600\ufe0f\n\nI also like being wholesome, making new friends, posting about games, my everyday life, cats, NASCAR, good vibes, fumos!",
                    "id": "1126249378279297027",
                    "name": "Vaxen #DailyYuuko \u2603\ufe0f",
                    "username": "YuukoEnjoyer"
                },
                {
                    "created_at": "2019-12-18T22:47:10.000Z",
                    "description": "The Republican party is bad for America. The Conservatives are Trump bootlickers who are afraid to stand up to him. This great nation is in serious trouble.",
                    "id": "1207432044390699008",
                    "name": "Angry Patriot",
                    "username": "AngryPatriot20"
                },
                {
                    "created_at": "2012-11-05T05:19:37.000Z",
                    "description": "Employment lawyer. Represent employers and employees. 30 years ago, my mentor told me to seek the truth as a lawyer. Still do that. Tweets are not legal advice.",
                    "id": "926909484",
                    "name": "Alfred Southerland",
                    "username": "TexasEEOLaw"
                }
            ]
        },
        "meta": {
            "newest_id": "1375567243082338314",
            "next_token": "b26v89c19zqg8o3fosnr8q7zstmzppg3jgd1cvynkb919",
            "oldest_id": "1360693490880032770",
            "result_count": 13
        }
    }
]

# Create file
csvFile = open("sample_data.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)

# Create headers for the data I want to save. I only want to save these columns in my dataset
csvWriter.writerow(
    ["author_id", "created_at", "tweet_id", "text", "bio", "image_url"])
csvFile.close()



def append_to_csv(json_response, csvFile):
    # counter variable
    global author_id, created_at, tweet_id, text, bio, image_url

    # open CSV file
    csvFile = open(csvFile, "a", newline="", encoding='utf-8')
    csvWriter = csv.writer(csvFile)

    # loop through each tweet
    for each_dict in json_response:
        
        # loop 1. author ID, time created, tweet ID tweet text
        for tweet in each_dict['data']:

            # 1. Author ID
            author_id = tweet['author_id']

            # 2. Time created
            created_at = dateutil.parser.parse(tweet['created_at'])

            # 3. Tweet ID
            tweet_id = tweet['id']

            # 4. Tweet text
            text = tweet['text']
            
            # loop 2. description/bio loop
            for dic in each_dict['includes']['users']:

                # 5. description
                if 'description' in dic:
                    bio = dic['description']
                else:
                    bio = " "

                    # loop 3. image_url/url loop
                    for element in each_dict['includes']['media']:

                        # 6. image url
                        if 'url' in element:
                            image_url = element['url']
                        else:
                            image_url = " "

                    # assemble all data in a list
                    res = [author_id, created_at, tweet_id, text, bio, image_url]
                    csvWriter.writerow(res)

                    # close CSV file
                    csvFile.close()


append_to_csv(json_response, "sample_data.csv")

As can be seen df only contains the predefined column names.

# import sample_data.csv as df
df = pd.read_csv(r'path...\sample_data.csv')

print(df)
Empty DataFrame
Columns: [author_id, created_at, tweet_id, text, bio, image_url]
Index: []

EDITED: Changed indentation in # 3 loop and csvFile.close().

def append_to_csv(json_response, csvFile):
    # counter variable
    global author_id, created_at, tweet_id, text, bio, image_url

    # open CSV file
    csvFile = open(csvFile, "a", newline="", encoding='utf-8')
    csvWriter = csv.writer(csvFile)

    # loop through each tweet
    for each_dict in json_response:

        # loop 1. author ID, time created, tweet ID tweet text
        for tweet in each_dict['data']:

            # 1. Author ID
            author_id = tweet['author_id']

            # 2. Time created
            created_at = dateutil.parser.parse(tweet['created_at'])

            # 3. Tweet ID
            tweet_id = tweet['id']

            # 4. Tweet text
            text = tweet['text']

            # loop 2. description/bio loop
            for dic in each_dict['includes']['users']:

                # 5. description
                if 'description' in dic:
                    bio = dic['description']
                else:
                    bio = " "

                # loop 3. image_url/url loop
                for element in each_dict['includes']['media']:

                    # 6. image url
                    if 'url' in element:
                        image_url = element['url']
                    else:
                        image_url = " "

                    # assemble all data in a list
                    res = [author_id, created_at, tweet_id, text, bio, image_url]
                    csvWriter.writerow(res)

    # close CSV file
    csvFile.close()

The issue now is that the append_to_csv appends the same tweets 5 times for the 5 users following the first politician and 13 times for the 13 users following the second politician resulting in a df with 194 rows instead of 18 rows.

Have you checked that csvFile.close() is ever called? That line looks misplaced to me. Shouldn't it go after the loop? I might be wrong, still studying your code. — cherrywoods
– cherrywoods, Commented Jan 10, 2022 at 21:16
Please check the indentation of the provided code, it looks somewhat odd to me — cherrywoods
– cherrywoods, Commented Jan 10, 2022 at 21:26

s.dallapalma · Accepted Answer · 2022-01-11 13:42:41Z

1

There are two each_dict objects in json_response. They have 5 and 13 tweets, respectively (each_dict['data']). In addition, there are 5 and 13 elements in each_dict['includes']['users'], respectively.

You got 194 elements because in the first iteration of for each_dict in json_response: you save data 5x5=25 times (loop 2 is executed 5 times for every tweet in loop 1). While in the second iteration you save data 13x13=169 times (loop 2 is executed 13 times for every tweet in loop 1).

You should append data to your csv outside loop 2. That is,

for each_dict in json_response:

    for tweet in each_dict['data']:
        # ...
        
        for dic in each_dict['includes']['users']:
            # ...
        
        res = [author_id, created_at, tweet_id, text, bio, image_url]
        csvWriter.writerow(res)

In addition, I recommend using a pandas dataframe to store the info you need and save to csv. It makes the code more readable and you do not have to worry about opening a buffer. See my recommendation below, including renaming:

import pandas as pd

df = pd.DataFrame()

for each_dict in json_response:
    
    for tweet in each_dict['data']:
        row = {}
        row["author_id"] = tweet['author_id']
        row["created_at"] = dateutil.parser.parse(tweet['created_at'])
        row["tweet_id"] = tweet['id']
        row["text"] = tweet['text']
        
        for user in each_dict['includes']['users']:
            if user["id"] == row["author_id"]:
                row["bio"] = user['description']#.encode('utf-16','surrogatepass').decode('utf-16') # uncomment this if you get UnicodeError
        
        for media in each_dict['includes']['media']:
            row['image_url'] = media.get('url', ' ')

        df = df.append(row, ignore_index=True)  
        # Note, since the dataframe is initially empty with no columns, appending a dictionary (i.e, row) will automatically generate the header based on the dictionary's keys.  

df.to_csv('path/to/file.csv')

Output

               tweet_id            author_id                created_at   ...
0   1375057688355336195           2877379617  2021-03-25T12:11:14.000Z   ...
1   1374085719472361474  1265018154444562440  2021-03-22T19:48:59.000Z   ...
...
17  1360693490880032770            926909484  2021-02-13T20:53:03.000Z   ...

edited Jan 11, 2022 at 13:42

answered Jan 11, 2022 at 0:44

s.dallapalma

1,3231 gold badge13 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Marco Liedecke Over a year ago

I see the utility in appending the data into a pandas data frame instead of a CSV file and will therefore use that method instead. However, your code suggestion only appends the last tweet from the two each_dict objects in json_response resulting in a df of two rows.

s.dallapalma Over a year ago

Wrong indendation in my example, my bad. I edited the answer and tested it. The final dataset consists of 18 rows and 6 columns. Can you double check?

Marco Liedecke Over a year ago

Yes, the new code does produce the right number of rows 18 and columns 6. That is great... However, since Stackoverflow does not allow https to be posted in the question I had to delete them from the json_response under ['includes']['media']['url'] before posting the question. When run the code with the https present in json_response the rows in column image_url are filled by the same url within the same each_dict .

s.dallapalma Over a year ago

Afraid I don't fully understand the last comment, but seems to be a different problem than the one described in this question. To avoid having the same url in every row, you should add an if statement like I have done for the user's bio. However, that's possible if the data "media" contains some information about the related user or tweet (like an author_id or tweet_id). Besides, please considering closing this one first (by accepting the answer if it solves the problem to properly append data to the csv)

Marco Liedecke Over a year ago

True. It is a different question. Your code solves my initial question regarding appending data to a CSV. Your help is highly appreciated @s.dallapalma.

cherrywoods · Accepted Answer · 2022-01-10 21:24:26Z

1

Looks like the else branch of if 'description' in dic: is never executed. If your code is indented correctly, then also the csvWriter.writerow part is never executed because of this.

That yields that no contents are written to your file.

A comment on code style:

use with open(file) as file_variable: instead of manually using open and close. That can save you some trouble, e.g. the trouble you would get when the else branch would indeed be executed and the file would be closed multiple times :)

answered Jan 10, 2022 at 21:24

cherrywoods

1,3721 gold badge11 silver badges22 bronze badges

4 Comments

Marco Liedecke Over a year ago

Thank you for your comment @cherrywoods. I will look at the indentation though my IDE (PyCharm) usually helps with indentation there could still be some errors. Regarding with open(file) as file_variable: I will have to read up on that.

cherrywoods Over a year ago

what I am trying to say: I think all the lines below # loop 3. should be indented one step less at least.

Marco Liedecke Over a year ago

I have corrected the indentation so that the code does append data. However, it appends the same data either 5 times or 13 times resulting in 194 rows where there should just have been 13 rows.

cherrywoods Over a year ago

That is because the writerow code is still part of the third loop. That's why it is adding way too many entries. If you are using PyCharm or some other IDE, I recommend stepping trough the code using the debugger (set a breakpoint in front of the first loop and use F8 to step trough it). If you are not using an IDE, put print statements into your code and observe how often they are printed.

Collectives™ on Stack Overflow

Append data to CSV using a nested loop

2 Answers 2

5 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related