1

Say for example I have a set of dates and a set of numbers of those days which are taken at different intervals.

Just for illustration sake, let's say the numbers are daily from today (September 29) until three months from now (December 29), monthly from three months after to say two years, quarterly from two to 10 years and yearly after that for another 50 years.

Now the requirements is such that we still follow all the date intervals "pattern" but instead the time series should start at each end of the quarter (so Mar 31, Jun 30, Sept 30 and Dec 31), with the numbers linearly interpolated in-between. Thus, using the example above, my new series should be daily numbers from September 30 (first end of quarter) to Dec 31, monthly from Dec 31 2012 to Dec 31 2014, quarterly from Dec 31 2014 to Dec 31 2022 and yearly after, all prices in the new time series that are not in the old time series are calculated using linear interpolation).

Is there any way we could do it efficiently and is there any code example I can make use of?

Appreciate for your help!

2
  • 2
    Have you checked what the pandas package offers? They should have some pretty good coverage of time series manipulation (they use the code from the scikits.timeseries for that). Commented Sep 29, 2012 at 14:06
  • no luck with Panda since it is a third party package and is not available on our production servers (IT won't install those), any other suggestions? Commented Oct 2, 2012 at 2:30

1 Answer 1

1

Here's a way to do it just with datetime and calendar. It's rather lengthy though, beware.

First, we need a method to make the desired time series

Months and quarters are a bit tricky, which date is one month after January 31, for example? But a method could look like this:

For testing, I included the generation of random values that belong with the dates.

from datetime import datetime, timedelta, date
import calendar
from random import random

def makeseries(startdate):
    datesA = [startdate] # collect the dates in this list
    valsA = [random()]   # and the randomly generated 'data' in this one
    date = startdate    

    # add days
    step = timedelta(1)
    while date - startdate <= timedelta(91):
        date += step
        datesA += [date]
        valsA += [random()]

    # add months
    step = timedelta(30)
    while date - startdate <= timedelta(2*365):
        if date.month in [1,3,5,7,8,10,12]:
            date += timedelta(1)
        elif date.month == 2:
            date -= timedelta(2)
        date += step
        datesA += [date]
        valsA += [random()]

    # add quarters
    step = timedelta(91)
    while date - startdate <= timedelta(int(365*10)):
        date += step
        if date.year % 4 == 0:
            date += timedelta(1)
        datesA += [date]
        valsA += [random()]

    # add years
    step = timedelta(365)
    while date - startdate <= timedelta(int(365*50)):
        date += step
        if date.year % 4 == 0:
            date += timedelta(1)
        datesA += [date]
        valsA += [random()]

    return datesA, valsA

Then, a simple method to find the nearest date to a given date in a series of dates

def findIndexOfNearest(series, D):
    # returns the index of the date in series that is closest to, but greater than D
    for i, date in enumerate(series):
        if date > D:
            return i
    return None

Generate the two time series, plus some mock date for the first series

thisyear = datetime.today().year
quarterEndMonth = (datetime.today().month+2)//3*3
quarterEndDay = calendar.monthrange(thisyear, quarterEndMonth)[1]

d1,v1 = makeseries(date.today())
d2,_ = makeseries(date(thisyear,quarterEndMonth, quarterEndDay))
v2 = []

Interpolate using timedeltas and print the interpolated values

for d in d2: 
    i = findIndexOfNearest(d1, d)
    if i:
        prev = d1[i-1]
        next = d1[i]
        prevRatio = 1-(d-prev).total_seconds()/(next-prev).total_seconds()
        nextRatio = 1-(next-d).total_seconds()/(next-prev).total_seconds()
        interp = prevRatio*v1[i-1] + nextRatio*v1[i]
        v2 += [interp]
        print("%s = %.2f * %s + %.2f * %s" % (d, prevRatio, prev, nextRatio, next))
        print("%17.2f * %10.2f + %.2f * %10.2f = %.2f" % \
               (prevRatio, v1[i-1], nextRatio, v1[i], interp))
    else: # date to be interpolated is past last original date
        v2 += [v1[-1]]
        print("%s = 1.00 * %s = %24.2f" % (d,d1[-1],v1[-1]))

Some example output:

Here, the original series just switched to 3-month gaps, with one date in November, and another in February the next year. The date for which we are interpolating is in December.

                     original           original
                      date                date
                        v                   v
2014-12-02 = 0.69 * 2014-11-04 + 0.31 * 2015-02-03
     ^       0.69 *       0.95 + 0.31 *       0.10 = 0.69
     |         ^           ^       ^           ^       ^
     |         |        original   |       original   interpolated 
date from      |         value     |         value       value
2nd series   weight              weight
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.