2

I'm working a beginner tutorial on this dataset here:

http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data

I've loaded it like so:

dataset = pd.read_csv("sonar.all-data.csv", header=None)

All the numbers and metrics seem to be correct.

If I try to do a histogram or density plot, it works fine. But if I try to do a box plot, I get an exception:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __setitem__(self, key, value)
    975             if is_integer(key) and not self.index.inferred_type == "integer":
    976                 # positional setter
--> 977                 values[key] = value
    978             else:
    979                 # GH#12862 adding a new key to the Series

IndexError: index 0 is out of bounds for axis 0 with size 0

It has drawn the first box plot. I looked in the CSV file and there doesn't seem to be any weird data in the second column.

Just doing:

dataset.plot(kind='box', subplots=True, layout=(8,8), sharex=False, sharey=False, fontsize=1)
plt.show()

Versions:

scipy: 1.6.2
numpy: 1.20.1
matplotlib: 3.3.4
pandas: 1.2.4
sklearn: 0.24.1

Sample data if link dies

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,0.1609,0.1582,0.2238,0.0645,0.066,0.2273,0.31,0.2999,0.5078,0.4797
0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,0.4918,0.6552,0.6919,0.7797,0.7464,0.9444,1.0,0.8874,0.8024,0.7818
0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,0.6333,0.706,0.5544,0.532,0.6479,0.6931,0.6759,0.7551,0.8929,0.8619
0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,0.0881,0.1992,0.0184,0.2261,0.1729,0.2131,0.0693,0.2281,0.406,0.3973
0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,0.4152,0.3952,0.4256,0.4135,0.4528,0.5326,0.7306,0.6193,0.2032,0.4636
0.0286,0.0453,0.0277,0.0174,0.0384,0.099,0.1201,0.1833,0.2105,0.3039,0.2988,0.425,0.6343,0.8198,1.0,0.9988,0.9508,0.9025,0.7234,0.5122
0.0317,0.0956,0.1321,0.1408,0.1674,0.171,0.0731,0.1401,0.2083,0.3513,0.1786,0.0658,0.0513,0.3752,0.5419,0.544,0.515,0.4262,0.2024,0.4233
0.0519,0.0548,0.0842,0.0319,0.1158,0.0922,0.1027,0.0613,0.1465,0.2838,0.2802,0.3086,0.2657,0.3801,0.5626,0.4376,0.2617,0.1199,0.6676,0.9402
0.0223,0.0375,0.0484,0.0475,0.0647,0.0591,0.0753,0.0098,0.0684,0.1487,0.1156,0.1654,0.3833,0.3598,0.1713,0.1136,0.0349,0.3796,0.7401,0.9925
0.0164,0.0173,0.0347,0.007,0.0187,0.0671,0.1056,0.0697,0.0962,0.0251,0.0801,0.1056,0.1266,0.089,0.0198,0.1133,0.2826,0.3234,0.3238,0.4333
0.0039,0.0063,0.0152,0.0336,0.031,0.0284,0.0396,0.0272,0.0323,0.0452,0.0492,0.0996,0.1424,0.1194,0.0628,0.0907,0.1177,0.1429,0.1223,0.1104
0.0123,0.0309,0.0169,0.0313,0.0358,0.0102,0.0182,0.0579,0.1122,0.0835,0.0548,0.0847,0.2026,0.2557,0.187,0.2032,0.1463,0.2849,0.5824,0.7728
0.0079,0.0086,0.0055,0.025,0.0344,0.0546,0.0528,0.0958,0.1009,0.124,0.1097,0.1215,0.1874,0.3383,0.3227,0.2723,0.3943,0.6432,0.7271,0.8673
0.009,0.0062,0.0253,0.0489,0.1197,0.1589,0.1392,0.0987,0.0955,0.1895,0.1896,0.2547,0.4073,0.2988,0.2901,0.5326,0.4022,0.1571,0.3024,0.3907
0.0124,0.0433,0.0604,0.0449,0.0597,0.0355,0.0531,0.0343,0.1052,0.212,0.164,0.1901,0.3026,0.2019,0.0592,0.239,0.3657,0.3809,0.5929,0.6299
0.0298,0.0615,0.065,0.0921,0.1615,0.2294,0.2176,0.2033,0.1459,0.0852,0.2476,0.3645,0.2777,0.2826,0.3237,0.4335,0.5638,0.4555,0.4348,0.6433
0.0352,0.0116,0.0191,0.0469,0.0737,0.1185,0.1683,0.1541,0.1466,0.2912,0.2328,0.2237,0.247,0.156,0.3491,0.3308,0.2299,0.2203,0.2493,0.4128
0.0192,0.0607,0.0378,0.0774,0.1388,0.0809,0.0568,0.0219,0.1037,0.1186,0.1237,0.1601,0.352,0.4479,0.3769,0.5761,0.6426,0.679,0.7157,0.5466
0.027,0.0092,0.0145,0.0278,0.0412,0.0757,0.1026,0.1138,0.0794,0.152,0.1675,0.137,0.1361,0.1345,0.2144,0.5354,0.683,0.56,0.3093,0.3226
0.0126,0.0149,0.0641,0.1732,0.2565,0.2559,0.2947,0.411,0.4983,0.592,0.5832,0.5419,0.5472,0.5314,0.4981,0.6985,0.8292,0.7839,0.8215,0.9363
0

1 Answer 1

2
  • I don't know why, but using subplots=True with numeric column names seems to be causing the issue.
  • The resolution is to convert the column names to strings
import pandas as pd

# load the data
df = pd.read_csv("sonar_all-data.csv", header=None)

# check the column name type
print(type(df.columns[0]))
[out]:
numpy.int64

# convert the column names to strings
df.columns = [f'{v}' for v in df.columns]

# check the column name type
print(type(df.columns[0]))
[out]:
str

# plot the dataframe
df.plot(kind='box', layout=(10, 6), figsize=(20, 20), subplots=True)
plt.show()

enter image description here

  • With subplots=False the plot works with numeric column names

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Weird. The numeric names work with other types of graphs, but thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.