Scatter plot from multiple columns of a pandas dataframe

Question

I have a pandas dataframe that looks as below:

    Filename    GalCer(18:1/12:0)_IS    GalCer(d18:1/16:0)  GalCer(d18:1/18:0)  

0   A-1-1   15.0    1.299366    40.662458   0.242658    6.891069    0.180315    

1   A-1-2   15.0    1.341638    50.237734   0.270351    8.367316    0.233468    

2   A-1-3   15.0    1.583500    47.039423   0.241681    7.902761    0.201153    

3   A-1-4   15.0    1.635365    53.139610   0.322680    9.578195    0.345681    

4   B-1-10  15.0    2.370330    80.209846   0.463770    13.729810   0.395355

I am trying to plot a scatter sub-plots with a shared x-axis with the first column "Filename" on the x-axis. While I am able to generate barplots, the following code gives me a key error for a scatter plot:

import matplotlib.pyplot as plt
colnames = list (qqq.columns)

qqq.plot.scatter(x=qqq.Filename, y=colnames[1:], legend=False, subplots = True, sharex = True, figsize = (10,50))

KeyError: "['A-1-1' 'A-1-2' 'A-1-3' 'A-1-4' 'B-1-10' ] not in index"

The following code for barplots works fine. Do I need to specify something differently for the scatterplots?

import matplotlib.pyplot as plt
colnames = list (qqq.columns)
qqq.plot(x=qqq.Filename, y=colnames[1:], kind = 'bar', legend=False, subplots = True, sharex = True, figsize = (10,30))

y = colnames[1:] refers to the list of the column names, not to the data within. — mauve
– mauve, Commented Jul 17, 2017 at 15:25

ImportanceOfBeingErnest · Accepted Answer · 2017-07-17 16:03:21Z

A scatter plot will require numeric values for both axes. In this case you can use the index as x values,

df.reset_index().plot(x="index", y="other column")

The problem is now that you cannot plot several columns at once using the scatter plot wrapper in pandas. Depending on what the reason for using a scatter plot are, you may decide to use a line plot instead, just without lines. I.e. you may specify linestyle="none" and marker="o" to the plot, such that points appear on the plot.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

fn = ["{}_{}".format(i,j) for i in list("ABCD") for j in range(4)]
df = pd.DataFrame(np.random.rand(len(fn), 4), columns=list("ZXYQ"))
df.insert(0,"Filename",pd.Series(fn))

colnames = list (df.columns)
df.reset_index().plot(x="index", y=colnames[1:], kind = 'line', legend=False, 
                 subplots = True, sharex = True, figsize = (5.5,4), ls="none", marker="o")

plt.show()

In case you absolutely need a scatter plot, you may create a subplots grid first and then iterate over the columns and axes to plot one scatter plot at a time to the respective axes.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

fn = ["{}_{}".format(i,j) for i in list("ABCD") for j in range(4)]
df = pd.DataFrame(np.random.rand(len(fn), 4), columns=list("ZXYQ"))
df.insert(0,"Filename",pd.Series(fn))

colnames = list (df.columns)
fig, axes = plt.subplots(nrows=len(colnames)-1, sharex = True,figsize = (5.5,4),)

for i, ax in enumerate(axes):
    df.reset_index().plot(x="index", y=colnames[i+1], kind = 'scatter', legend=False, 
                          ax=ax, c=colnames[i+1], cmap="inferno")

plt.show()

Collectives™ on Stack Overflow

Scatter plot from multiple columns of a pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related