0

I have a Pandas DataFrame that contains a Column of Integers. I'm trying to use a for loop to pull out only the Count values of that DataFrame where the 'Artist' column matches 'The Beatles'.

tot=[]

for art in df:

    for df['Artist'] in art:

        if art['Artist'] == 'The Beatles':

            tot.append(artist['Count'])

The Dataframe format is as such:


'''

Rank    Album   Artist  Count
1   1   The Beatles [White Album]   The Beatles 1634
2   2   Rubber Soul The Beatles 1497
3   3   Revolver    The Beatles 1489
4   4   Abbey Road  The Beatles 1468
5   5   Meet Me in St. Louis    Judy Garland with Georgie Stoll and His Orchestra   1399

'''

I receive the "TypeError: string indices must be integers".
5
  • I want to be able to show the Total of 'The Beatles' counts in comparison to the total and the visualize it afterwards. Commented Sep 5, 2019 at 16:14
  • Can you provide the df.head()? Commented Sep 5, 2019 at 16:20
  • 1
    Sounds like a job for df['column'].value_counts(), df.query(), df.groupby(), df.filter(), or any of the other methods for selecting data from a dataframe. Looping is almost never the best option in pandas. Commented Sep 5, 2019 at 16:22
  • What is your expected output from this small dataset? Commented Sep 5, 2019 at 16:29
  • I wanted to create a list containing the count sum of 'The Beatles' albums from a table of 100 albums, and then compare that count to the total (using a Pie Chart eventually). This was the code I needed: 'df.loc[df['Artist'] == 'The Beatles', 'Count'].sum()' Commented Sep 5, 2019 at 16:30

3 Answers 3

1

if you want to save a array with all Counts where Artist is The Beatles, use:

df.loc[df['Artist'] == 'The Beatles', 'Count'].tolist()

if you need to sum all the Counts, use .sum()

df.loc[df['Artist'] == 'The Beatles', 'Count'].sum()
Sign up to request clarification or add additional context in comments.

Comments

1

Method 1:

If you want to count the entry The Beatles in your Artist column from your DataFrame, you don't have to do a loop.

Use pandas.DataFrame.groupby instead, with .transform('count'). It will give you the count of each entry of your Artist column.

df['Count'] = df.groupby('Artist')['Artist'].transform('count')

Which gives:

>>> data = ['The Beatles', 'Some Artist', 'Some Artist', 'The Beatles','The Beatles','The Beatles']
>>> df = pd.DataFrame(data,columns = ['Artist'])
>>> df
        Artist
0  The Beatles
1  Some Artist
2  Some Artist
3  The Beatles
4  The Beatles
5  The Beatles
>>> df['Count'] = df.groupby('Artist')['Artist'].transform('count')
>>> df
        Artist  Count
0  The Beatles      4
1  Some Artist      2
2  Some Artist      2
3  The Beatles      4
4  The Beatles      4
5  The Beatles      4

This is helpful if you want to graph your result. Just create a dictionary with keys equal to Artist column value and values equal to Count column value.

The repition won't be a problem since python dictionaries does not allow duplicated values on keys. Doing so:

>>> artist_count_dict = dict(zip(df['Artist'],df['Count']))
>>> artist_count_dict
{'The Beatles': 4, 'Some Artist': 2}

You may now access those values for your graphing purposes.

Method 2:

You can also use df['Column Name'].value_counts() to give you the stats you need.

>>> df['Artist'].value_counts()
The Beatles    4
Some Artist    2
Name: Artist, dtype: int64

Create a new dataframe if you need to store it into one:

>>> df2 = df['Artist'].value_counts()
>>> df2 = pd.DataFrame(df2)
>>> df2.index.name = 'Artist'
>>> df2.columns = ['Count']
>>> df2
             Count
Artist
The Beatles      4
Some Artist      2

Comments

0

I pressume you are looking for this

tot = df.loc[df['Artist']=='The Beatles','Count'].tolist()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.