1

Hello i have this code that produce a txt file with compress string that will be inserted into a postgres database

def test_insert():
    str_test = '4 1 2\n 2 4 5\n'.encode('utf8')
    cmpstr = zlib.compress(str_test)
    str_test_to_write = '\\x' + cmpstr.encode('hex_codec')

    with open('outfile.txt','w') as output_file:
        output_file.write(str(1) + '|'+ str_test_to_write + '\n')
        output_file.write(str(2) + '|'+ str_test_to_write + '\n')

Then i use the command copy to load the information into my table:

time cat outfile.txt |psql teste3 -c "\copy zstr(id,zstr) from stdout with delimiter '|'"

This is my table:

drop table if exists zstr; 
    create table zstr(
    id int, 
    zstr bytea, 
    primary key(id));

Then i want to select my strings but i'm getting this error:

>>> import psycopg2
>>> import zlib
>>> con = psycopg2.connect(host = 'X', database = 'Y', user = 'Z')
>>> con.autocommit = True
>>> cur = con.cursor()
>>> cur.execute('select * from zstr where id = 1')
>>> row = cur.fetchone()
>>> row
(1, <read-only buffer for 0x7fe19b75f270, size 41, offset 0 at 0x7fe196976f30>)
>>> a = str(row[1])
>>> q = zlib.decompress(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check

So how can i get my strings?

The output i want:

'4 1 2\n 2 4 5\n'
1
  • Why are you zlib compressing strings? Commented Apr 20, 2017 at 17:56

1 Answer 1

2

There is almost no reason to do this. PostgreSQL naturally compresses text, with LZ, if the value is larger than the TOAST_TUPLE_THRESHOLD. From the docs on TOAST

The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains can be had. During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.

It does this transparently for the user. Just store the text itself.

Sign up to request clarification or add additional context in comments.

6 Comments

Oww i didn't know that. And can i set the threshold to less than 1 kb?
You can, however I would not. =) Unless you've got like millions (or billions) of rows and it really matters.
I have 100 millions of rows xD So what is your recommendation for the threshold? and why?
I would leave it alone. The trick, as a DBA, is to not retrieve data you don't use. Shy of that, it's just wasted hard drive space and a very hard core pre-optimization. How much waste space depends on the row size and the compression ratio. But if we assume 100% compression (ie, no size) the difference between 1kb and 2kb is 100GB over 100 million rows. However, it's certainly far less than that. Further, there is overhead in speed and space associated with TOAST.
Humm but what is the biggest problem if i modify my threshold to a crazy number like 5 bytes?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.