0

How to convert all the string dates to Timestamp while loading a JSON?

The solution should work for keys and list of values as well. Solutions mentioned over here How to convert to a Python datetime object with JSON.loads? only work for string values and don't convert all the cases for the example json string given below:

{
    "single_date": "10/02/2020",
    "list_of_dates": ["10/03/2020", "10/04/2020"],
    "10/05/2020": "date_as_key",
    "nested":{
        "single_date": "10/02/2020",
        "list_of_dates": ["11/03/2020", "11/04/2020"],
        "11/05/2020": "date_as_key"
    }
}

Expected Output:

{
    'single_date': Timestamp('2020-10-02 00:00:00'),
    'list_of_dates': [Timestamp('2020-10-03 00:00:00'), Timestamp('2020-10-04 00:00:00')],
    Timestamp('2020-10-05 00:00:00'): 'date_as_key',
    'nested': {
        'single_date': Timestamp('2020-10-02 00:00:00'),
        'list_of_dates': [Timestamp('2020-11-03 00:00:00'), Timestamp('2020-11-04 00:00:00')],
        Timestamp('2020-11-05 00:00:00'): 'date_as_key'
    }
}

Update:

In the output Timestamp is the pandas timestamp but other datetime objects would work too. So, alternate output can also be:

{
    'single_date': datetime.datetime(2020, 10, 2, 0, 0),
    'list_of_dates': [datetime.datetime(2020, 10, 3, 0, 0),datetime.datetime(2020, 10, 4, 0, 0)],
    datetime.datetime(2020, 10, 5, 0, 0): 'date as key',
    'nested': {
        'single_date': datetime.datetime(2020, 10, 2, 0, 0),
        'list_of_dates': [datetime.datetime(2020, 11, 3, 0, 0),datetime.datetime(2020, 11, 4, 0, 0)],
        datetime.datetime(2020, 11, 5, 0, 0): 'date as key'
    }
}
5
  • Is your data strongly typed? Perhaps you would prefer to work with classes instead of dict objects? Commented Mar 5, 2022 at 15:30
  • 1
    No, actually the schema of the data is also not fixed. So using a single class with attributes won't work. Commented Mar 5, 2022 at 16:03
  • 1
    What is Timestamp? This isn't a standard python class. Commented Mar 5, 2022 at 16:06
  • agreed, can you explain more what a Timestamp is? If you're looking to just get a string value that looks like that, you can just use some string manipulation techniques, should be pretty straightforward enough. Commented Mar 5, 2022 at 18:56
  • 1
    The idea was to convert the string to any usable datetime like object. I have updated the question with more details. Thanks! Commented Mar 5, 2022 at 19:34

1 Answer 1

2

The simplest (and most performant) option without using external libraries like pandas would be to define a custom hook to use along with json.loads:

from datetime import date


def date_hook(obj):
    t = type(obj)

    if t is dict:
        return {date_hook(k): date_hook(v) for k, v in obj.items()}
    if t is list:
        return [date_hook(elem) for elem in obj]
    if t is str:
        # best effort to check if it matches date format
        if '/' in obj and len(obj) == 10:
            # noinspection PyBroadException
            try:
                # format: %m/%d/%Y
                month, day, year = map(int, obj.split('/', 2))
                return date(year, month, day)
            except Exception:
                pass
        return obj
    return obj

Now do a quick test by loading sample data:

from json import loads
from pprint import pprint


string = """
{
    "single_date": "10/02/2020",
    "list_of_dates": ["10/03/2020", "10/04/2020"],
    "10/05/2020": "date_as_key",
    "nested":{
        "single_date": "10/02/2020",
        "list_of_dates": ["11/03/2020", "11/04/2020"],
        "11/05/2020": "date_as_key"
    }
}
"""

pprint(loads(string, object_hook=date_hook))

Result:

{datetime.date(2020, 10, 5): 'date_as_key',
 'list_of_dates': [datetime.date(2020, 10, 3), datetime.date(2020, 10, 4)],
 'nested': {datetime.date(2020, 11, 5): 'date_as_key',
            'list_of_dates': [datetime.date(2020, 11, 3),
                              datetime.date(2020, 11, 4)],
            'single_date': datetime.date(2020, 10, 2)},
 'single_date': datetime.date(2020, 10, 2)}

If curious, this actually comes out as 100x faster than an approach with the pandas library.

n = 1000
print('datetime.date:  ', timeit('loads(string, object_hook=date_hook)', globals=globals(), number=n))
print('pandas:         ', timeit('loads(string, object_hook=pd_date_hook)', globals=globals(), number=n))

Results on my PC (Windows):

datetime.date:   0.017586400000000002
pandas:          1.8995370999999999
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.