2

I would like to import json data into postgres. The data I have is of the orders of a million rows, sizes are a minimum of 700 MB and stretches till 3 GB.

Here's a sample data I created based on the structure of data I have. I tried importing this into postgres but I get an error.

Sample(1) data

{"offers":{"offer":[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]}}

Command I used and the error I got

# copy contrial from '/home/ubuntu/sample-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: {"offers":{"offer":[
COPY contrial, line 1, column info: "{"offers":{"offer":["

I modified the file to remove the first two keys and have just a list of jsons like below, but I still get an error.

Sample(2) data

[
{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]

Error

# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  The input string ended unexpectedly.
CONTEXT:  JSON data, line 1: [
COPY contrial, line 1, column info: "["

Sample(3) data I modified further

[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5\"  side"},
{"url": "https://some2-value.com", "nested": {"id":5,"value":"some2 text value"}, "quotes": "6\"  side"},
{"url": "https://some3-value.com", "nested": {"id":6,"value":"some3 text value"}, "quotes": "7\"  side"}]

Different Error

# copy contrial from '/home/ubuntu/sample2-data.json';
ERROR:  invalid input syntax for type json
DETAIL:  Token "side" is invalid.
CONTEXT:  JSON data, line 1: ...,"value":"some1 text value"}, "quotes": "5"  side...
COPY contrial, line 1, column info: "[{"url": "https://some1-value.com", "nested": {"id":4,"value":"some1 text value"}, "quotes": "5"  si..."

Create table statement

CREATE TABLE public.contrial (
    info json NOT NULL
);

The end goal is to create a table with keys as columns and values as records. Nested keys would need to be flattened.

+-------------------------+-----------+------------------+----------+
| url                     | nested_id | nested_value     | quotes   |
+-------------------------+-----------+------------------+----------+
| https://some1-value.com | 4         | some1 text value | 5\" side |
+-------------------------+-----------+------------------+----------+
| https://some2-value.com | 5         | some2 text value | 6\" side |
+-------------------------+-----------+------------------+----------+
| https://some3-value.com | 6         | some3 text value | 7\" side |
+-------------------------+-----------+------------------+----------+

1 Answer 1

3

I ended up using Andre Dunstan's blog and this SO answer which says to format the json in a specific way to use the copy command.

Since my structure is pretty defined for the files I'm parsing, I ended up with the following script.

def file_len(fname):
    # to find the number of lines in the file.
    # Has been pretty efficient even for millions of records
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

INPUTFILE = '/path/to/input.json'
OUTPUTFILE = '/path/to/output.json.csv'
LEN = file_len(INPUTFILE)
with open(OUTPUTFILE, 'w') as fo:
    with open(INPUTFILE, 'r') as fi:
        for i, l in enumerate(fi):
            # I skip the first line
            if i == 0: continue 
            
            # To remove the ']}}' from the end
            elif i+1 == LEN: _ = fo.write(l[:-3])
            
            # To remove the ',' from the end 
            # and add \n since write does not add newline on its own
            else: _ = fo.write(l[:-2]+'\n') 

# load statement

import sqlalchemy
POSTGRESQL = f'postgresql+psycopg2://{USERNAME}:{PASSWORD}@{HOSTNAME}/{DB}'
engine = sqlalchemy.create_engine(POSTGRESQL, echo=True)
            
con = engine.connect()
trans = con.begin()
LOAD_SQL = f"COPY tablename from '{OUTPUTFILE}' with csv delimiter E'\x01' quote E'\x02' null as '';"
try:
    con.execute(LOAD_SQL)
    trans.commit()
except Exception as e:
    trans.rollback()
finally:
    con.close()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.