0

I have a *.csv file containing data that includes date in the format "YYYY-MM" for the first column, letters on the second column, and two columns of data.

It will look something like this:

Date     inflation rate     CPI-Value      LIBOR-Rate

2003-09  inflation  rate    80.172         0.81
2003-10  inflation  rate    80.132         0.88
2003-11  inflation  rate    80.264         0.69
2003-12  inflation  rate    80.430         0.75
2004-01  inflation  rate    81.163         0.75
2004-02  inflation  rate    81.244         0.75
2004-03  inflation  rate    81.344         0.75
2004-04  inflation  rate    81.436         0.75
2004-05  inflation  rate    81.501         0.75
2004-06  inflation  rate    81.355         0.81
2004-07  inflation  rate    81.494         1.06
2004-08  inflation  rate    81.426         1.31
2004-09  inflation  rate    81.771         1.44
2004-10  inflation  rate    81.757         1.38
2004-11  inflation  rate    81.866         1.38
2004-12  inflation  rate    81.790         1.44
2005-01  inflation  rate    81.994         1.75
2005-02  inflation  rate    82.062         1.94
2005-03  inflation  rate    82.210         2.13
2005-04  inflation  rate    82.219         2.13
2005-05  inflation  rate    82.165         2.06

I would like to plot a line graph with the date as the x axis, and the one graph containing the values for CPI and LIBOR.

I have tried using

x, y = np.genfromtxt(CPI_df, usecols=(0, 2), unpack=True, delimiter=',')

plt.plot(x, y, 'ro--')
plt.show()

but there is a value error saying that certain lines have one column instead of two. However, I have already checked the csv file and there are no missing data.

Appreciate any help I can get, thank you!

6
  • You use delimiter=',', but there seems to be no comma in the file you quote Commented Nov 5, 2018 at 16:17
  • It's a csv file though. So does this mean that I need not put the delimiter? Commented Nov 5, 2018 at 16:24
  • If the three lines you show are really the first three lines of your file, it's obvious that there is no comma. However I'm not sure what other delimiter to use in that case, since there are also spaces within the cells it seems. Commented Nov 5, 2018 at 16:28
  • If it helps, I'm using the jupyter notebook to run the codes. Don't know if there is any difference. Commented Nov 5, 2018 at 16:37
  • No, that doesn't matter in this case. What would help is if you opened the file in an editor, and copied the first ten lines verbatim to your question. Commented Nov 5, 2018 at 17:11

1 Answer 1

1

The file format in use is really unfortunate. First you have an empty line between header and data, so you will need to skip the first two lines and cannot use the header.
Next you have two spaces as delimiter between some columns, but also between strings that are meant to be a single column.

Now if you really need to use this file as is, and want to use numpy to read it in, you also have the problem that the first column contains no numeric values. So you will need to play with the dtype.

The following would read the file and plot the dates as strings.

import numpy as np
import matplotlib.pyplot as plt

a = np.genfromtxt("data/inflation.txt", usecols=(0, 3), skip_header=2, dtype=None, encoding=None)
x = a["f0"]
y = a["f1"]

plt.plot(x, y, 'ro--')
plt.show()

Or if you want to plot dates instead,

import numpy as np
import datetime
import matplotlib.pyplot as plt

a = np.genfromtxt("data/inflation.txt", usecols=(0, 3), skip_header=2, dtype=None, encoding=None,
                  converters={0: lambda x: datetime.datetime.strptime(x, "%Y-%m")}, unpack=True)
x = a["f0"]
y = a["f1"]

plt.plot(x, y, 'ro--')
plt.show()

If using pandas instead of numpy, this becomes a bit easier. Plotting strings:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("data/inflation.txt", delim_whitespace=True)

plt.plot(df["Date"], df["CPI-Value"], 'ro--')
plt.show()

Or plotting dates:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("data/inflation.txt", delim_whitespace=True, 
                 parse_dates=[0], infer_datetime_format=True)

plt.plot(df["Date"], df["CPI-Value"], 'ro--')
plt.show()
Sign up to request clarification or add additional context in comments.

1 Comment

I've tried running the pandas scripts that are adapted to my data, but have received a key error for the plotting of strings. The plotting of dates returned TypeError: 'NoneType' object is not subscriptable. Appreciate all your help though!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.