1

I am trying to create a scatter plot using a dataset on movies. The goal is to look at the correlation between the different categories and the target variable, whether or not the movie won an award. I have tried doing a type call on my variables, and neither of them appear to be of type numpy.ndarray as they are both pandas dataframes, yet I still get the following error when I try to create a scatter plot:

TypeError: unhashable type: 'numpy.ndarray'

My code is as follows:

import pandas as pd
import matplotlib.pyplot as plt

file=pd.read_csv('academy_awards.csv',sep=',',error_bad_lines=False,encoding="ISO 8859-1")
print(file)
df=pd.DataFrame(file)

#df=df.dropna(axis=0,how='any')
target=df.Category
X=pd.DataFrame(df.Won)

y=target
#print(type(X))
#print(type(y))

plt.scatter(X,y)

The following are the first 5 lines of the dataset I am using:

Year,Category,Nominee,Additional Info,Won
2010 (83rd),Actor -- Leading Role,Javier Bardem,Biutiful 
{'Uxbal'},NO
2010 (83rd),Actor -- Leading Role,Jeff Bridges,True Grit {'Rooster 
Cogburn'},NO
2010 (83rd),Actor -- Leading Role,Jesse Eisenberg,The Social 
Network {'Mark Zuckerberg'},NO
2010 (83rd),Actor -- Leading Role,Colin Firth,The King's Speech 
{'King George VI'},YES
2010 (83rd),Actor -- Leading Role,James Franco,127 Hours {'Aron 
Ralston'},NO
2010 (83rd),Actor -- Supporting Role,Christian Bale,The Fighter 
{'Dicky Eklund'},YES

Any help or suggestions are greatly appreciated!

Edit: The following is the full traceback--

-----------------------------------------------------------------------
TypeError                                 Traceback (most recent call 
last)
<ipython-input-211-efcb7c41bca1> in <module>
     14 print(y.shape)
     15 
---> 16 plt.scatter(X,y)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/pyplot.py in scatter(x, y, s, c, marker, cmap, 
norm, vmin, vmax, alpha, linewidths, verts, edgecolors, data, **kwargs)
   2862         vmin=vmin, vmax=vmax, alpha=alpha, 
linewidths=linewidths,
   2863         verts=verts, edgecolors=edgecolors, **({"data": data} 
if data
-> 2864         is not None else {}), **kwargs)
   2865     sci(__ret)
   2866     return __ret

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1808                         "the Matplotlib list!)" % (label_namer, 
func.__name__),
   1809                         RuntimeWarning, stacklevel=2)
-> 1810             return func(ax, *args, **kwargs)
   1811 
   1812         inner.__doc__ = _add_data_doc(inner.__doc__,

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_axes.py in scatter(self, x, y, s, c, marker, 
cmap, norm, vmin, vmax, alpha, linewidths, verts, edgecolors, **kwargs)
   4170             edgecolors = 'face'
   4171 
-> 4172         self._process_unit_info(xdata=x, ydata=y, 
kwargs=kwargs)
   4173         x = self.convert_xunits(x)
   4174         y = self.convert_yunits(y)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_base.py in _process_unit_info(self, xdata, 
ydata, kwargs)
   2133             return kwargs
   2134 
-> 2135         kwargs = _process_single_axis(xdata, self.xaxis, 
'xunits', kwargs)
   2136         kwargs = _process_single_axis(ydata, self.yaxis, 
'yunits', kwargs)
   2137         return kwargs

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axes/_base.py in _process_single_axis(data, axis, 
unit_name, kwargs)
   2116                 # We only need to update if there is nothing 
set yet.
   2117                 if not axis.have_units():
-> 2118                     axis.update_units(data)
   2119 
   2120             # Check for units in the kwargs, and if present 
update axis

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/axis.py in update_units(self, data)
   1471         neednew = self.converter != converter
   1472         self.converter = converter
-> 1473         default = self.converter.default_units(data, self)
   1474         if default is not None and self.units is None:
   1475             self.set_units(default)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in default_units(data, axis)
    101         # default_units->axis_info->convert
    102         if axis.units is None:
--> 103             axis.set_units(UnitData(data))
    104         else:
    105             axis.units.update(data)

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in __init__(self, data)
    167         self._counter = itertools.count()
    168         if data is not None:
--> 169             self.update(data)
    170 
    171     def update(self, data):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site- 
packages/matplotlib/category.py in update(self, data)
    184         data = np.atleast_1d(np.array(data, dtype=object))
    185 
--> 186         for val in OrderedDict.fromkeys(data):
    187             if not isinstance(val, (str, bytes)):
    188                 raise TypeError("{val!r} is not a 
string".format(val=val))

TypeError: unhashable type: 'numpy.ndarray'
4
  • 1
    It helps if you post the full traceback which includes the line that's causing the error. It helps us help you faster Commented May 1, 2019 at 22:11
  • 1
    Your error doesn't match your code. Please provide the actual code that's generating the error (Your code shows scatter, your error references bar Commented May 1, 2019 at 22:30
  • Sorry--I tried it with plt.bar as well as scatter and forgot to switch it back. I will fix the error. Commented May 1, 2019 at 22:32
  • I guess it's problematic trying to plot a complete dataframe as x values. It looks like it's sufficient to replace X=pd.DataFrame(df.Won) by X=df.Won Commented May 1, 2019 at 22:39

2 Answers 2

1

First, you don't need to: df=pd.DataFrame(file). After opening the CSV file with pandas and saved in the file variable, you already get the data as dataFrame.

Then, you can easily call the scatter and choose the x-axis and y-axis with

df.plot(kind ="scatter", x= "Won", y = "Category")

You don't need to preprocess the data, because of it's already preprocessed after opened the file with pandas.

Sign up to request clarification or add additional context in comments.

5 Comments

When I try this, I get a new error that x needs to be numeric. Maybe I just need to try a different dataset. I have tried about 4 with the same general targets and predictors and had no luck.
ok, perfect, the problem is that the x-axis is as the Object type, I mean, string. You need to turn the x-axis into a float type.
use df.dtypes() and you would see the types of the data frame. tell me the name of the type of the Won column.
It is a series object
I knew it. Well, in that case, use df.astype(<name type>) and turn it into Float. Then try it to call the plot again.
1

Arrays are unhashable because they're mutable. You can hash it by converting it to an immutable tuple (by wrapping it with tuple()) but you usually shouldn't be trying to hash arrays anyways. Your data is probably of the wrong shape.

1 Comment

The shape of X is (10117, 1), and the shape of y is (10117,). Are these the wrong shape? I've done this with other datasets before with similar shapes and never come across this issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.