1

I'm trying to create a scatterplot of a dataset with point coloring based on different categorical columns. Seaborn works well here for one plot:

fg = sns.FacetGrid(data=plot_data, hue='col_1')
fg.map(plt.scatter, 'x_data', 'y_data', **kws).add_legend()
plt.show()

I then want to display the same data, but with hue='col_2' and hue='col_3'. It works fine if I just make 3 plots, but I'm really hoping to find a way to have them all appear as subplots in one figure. Unfortunately, I haven't found any way to change the hue from one plot to the next. I know there are plotting APIs that allow for an axis keyword, thereby letting you pop it into a matplotlib figure, but I haven't found one that simultaneously allows you to set 'ax=' and 'hue='. Any ideas? Thanks in advance!

Edit: Here's some sample code to illustrate the idea

xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)

sns.set(style='ticks')
fg = sns.FacetGrid(data=df, hue='pet', size=5)
fg.map(plt.scatter, 'x', 'y').add_legend()
fg = sns.FacetGrid(data=df, hue='hair', size=5)
fg.map(plt.scatter, 'x', 'y').add_legend()
plt.show()

This plots what I want, but in two windows. The color scheme is set in the first plot by grouping by 'pet', and in the second plot by 'hair'. Is there any way to do this on one plot?

1 Answer 1

1

In order to plot 3 scatterplots with different colors for each, you may create 3 axes in matplotlib and plot a scatter to each axes.

import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(10,5), 
                  columns=["x", "y", "col1", "col2", "col3"])

fig, axes = plt.subplots(nrows=3)
for ax, col in zip(axes, df.columns[2:]):
    ax.scatter(df.x, df.y, c=df[col])

plt.show()

enter image description here

For categorical data it is often easier to plot several scatter plots, one per category.

import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns


xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)


cols = ['pet',"hair"]
fig, axes = plt.subplots(nrows=len(cols ))
for ax,col in zip(axes,cols):
    for n, group in df.groupby(col):
        ax.scatter(group.x,group.y, label=n)
    ax.legend()

plt.show()

enter image description here

You may surely use a FacetGrid, if you really want, but that requires a different data format of the DataFrame.

import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns

xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)

df2 = pd.melt(df, id_vars=['x','y'], value_name='category', var_name="kind")

fg = sns.FacetGrid(data=df2, row="kind",hue='category', size=3)
fg.map(plt.scatter, 'x', 'y').add_legend()

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

If this solution is not what you are after, you should probably provide a sample dataset in the question and explain clearly what each plot should show.
Your plot above is what I'm after, except the columns specifying the colors are categorical variables. So, for example, "col1" might be favorite_movies = ['batman', 'the lego movie', 'alien', ...] That's where using Seaborn seemed like a good option, but like originally stated, I run into issues trying to change the data used to set the hue from plot to plot.
I suggest you show us the code that produces a dataframe you have in mind (e.g. it is not clear if the values from all columns are the same). Then we can find a solution
I edited the original post to include some example code. Hopefully this helps clear some things up. Thanks for the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.