0

Given a data frame "df", I need to obtain the correlation index between mean price and total volume for Region = "California".

Given Dataframe: enter image description here

Correlation index between California mean price and total volume:

cali_mean = df.groupby('Region').get_group('California')['AveragePrice'].mean()
max_volume = (df.groupby('Region')['TotalVolume'].sum()).max() #Output: 1028981653.17

# Correlation index between California mean price and total volume
df[cali_mean].corr(df['max_volume'])

When I tried determining the correlation index between California's mean price and total volume, I got the following error message. Is there a way to fix this?

Error message

KeyError                                  Traceback (most recent call last)
~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3620             try:
-> 3621                 return self._engine.get_loc(casted_key)
   3622             except KeyError as err:

~/opt/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

~/opt/miniconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 1.3939644970414187

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/var/folders/wv/42dn23fd1cb0czpvqdnb6zw00000gn/T/ipykernel_18660/3247367876.py in <module>
      1 # Correlation index between California mean price and total volume
----> 2 df[cali_mean].corr(df['max_volume'])

~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3503             if self.columns.nlevels > 1:
   3504                 return self._getitem_multilevel(key)
-> 3505             indexer = self.columns.get_loc(key)
   3506             if is_integer(indexer):
   3507                 indexer = [indexer]

~/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3621                 return self._engine.get_loc(casted_key)
   3622             except KeyError as err:
-> 3623                 raise KeyError(key) from err
   3624             except TypeError:
   3625                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 1.3939644970414187
3
  • cali_mean is a mean value not a column, which error implies. Commented Mar 24, 2022 at 6:18
  • Also note that there is not any correlation between two numbers. Commented Mar 24, 2022 at 6:18
  • @keramat How do I determine the correlation index then if mean value is not a column? Commented Mar 24, 2022 at 6:22

1 Answer 1

1

Note that the correlation is a measure of two vectors. So you can use:

df = pd.read_csv('avocado.csv')
temp = df[df['Region']=='California']
temp['AveragePrice'].corr(temp['TotalVolume'])

Output:

-0.7913852550045145

Sign up to request clarification or add additional context in comments.

8 Comments

What do the "tv" and "av" represent here? Meaning how do I fill in the values for "tv" and "av" so that it reflects mean price and total volume of California region?
How do you know which values to fill the vector "av" and "tv" with?
When I filled the vector with my data, I got "nan" output. Is this expected as you mentioned?
Can you send a sample of your data in a format that I can copy?
My data is in a form of a CSV file. Is there a way to send you that?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.