1

EDIT: Forgot to mention I am using Python 2.7

I have a large json file strctured like this:

[{
"headline": "Algérie Télécom prolonge son dispositif spécial Covid-19",
"url_src": "https://www.algerie360.com/algerie-telecom-prolonge-son-dispositif-special-covid-19/",
"img_src": "https://www.algerie360.com/wp-content/uploads/2020/04/DIA-Iddom-Algérie-télécom-320x200.jpg",
"news_src": "Algérie 360",
"catPT": "Ciência e Tecnologia",
"catFR": "Science et Technologie",
"catEN": "Science and Technology",
"lang": "French",
"epoch": 1591293345.817
},
{
"headline": "Internet haut débit à Alger : Lancement de la généralisation du  » fibre to home »",
"url_src": "https://www.algerie360.com/20200510-internet-haut-debit-a-alger-lancement-de-la-generalisation-du-fibre-to-home/",
"img_src": "https://www.algerie360.com/wp-content/uploads/2020/05/unnamed-320x200.jpg",
"news_src": "Algérie 360",
"catPT": "Ciência e Tecnologia",
"catFR": "Science et Technologie",
"catEN": "Science and Technology",
"lang": "French",
"epoch": 1591283345.817
},
...

I've been trying to write a .py script that opens my json file, removes all objects where the "epoch" key is less than 1591293345.817, and overwrites the current file.

Is this possible at all?

I've tried the following but my python knowledge is sketchy at best:

import time
import os
import json
import jsonlines

json_lines = []
with open('./json/news_done.json', 'r') as open_file:
    for line in open_file.readlines():
        j = json.loads(line)
        now = time.time()
        print(j['epoch'])
        lastWeek = now - 3600
        if not j['{epoch}'] > lastWeek:
            json_lines.append(line)

with open('./json/news_done.json', 'w') as open_file:
    open_file.writelines('\n'.join(json_lines))

2
  • Is the file in "json-lines" format (i.e. each line is a separae JSON object) or is it just one big structure like you show in the question? Commented Jun 12, 2020 at 11:43
  • I believe it is one big structure Commented Jun 12, 2020 at 15:35

2 Answers 2

2

Have you tried pandas framework? You can easily filter your columns with it.

I got this code snippet work with your example data:

import pandas as pd
import json

dataset = pd.read_json('example.json')
new_dataset = dataset[dataset['epoch'] >= 1591293345.817]
final_data = new_dataset.to_json(orient='records')

with open('example.json', 'w') as f:
    json.dump(final_data, f)

Sign up to request clarification or add additional context in comments.

2 Comments

I am getting the following error: ValueError: DataFrame constructor not properly called!
Could you give the whole error message? I wasn't able to trace back to that error
1

Looks like you're only removing the "epoch" tag but if I've understood correctly you want to dismiss the whole element

you can open the file entirely as a json instead of lines individually

import json,time
with open('./json/news_done.json', 'r') as open_file:
    yourFileRead = open_file.read()
    yourJson = json.loads(yourFileRead)

filteredList = []
for j in yourJson: # j is one element out of the list not only one line
   if time.time()-3600 > j['epoch']:
       filteredList.append(j)

with open('./json/news_done.json', 'w') as open_file:
    open_file.write(json.dumps(filteredList))

2 Comments

I get the following error msg: Traceback (most recent call last): File "/_scrapyard OSX/x_punger.py", line 8, in <module> if time.time()-3600 > j['epoch']: TypeError: string indices must be integers [Finished in 0.635s]
weird looks as if j is a string. Does the whole list look like your example?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.