0

Use Case: I'm using python clickhouse-client to establish a connection to my clickhouse cluster and insert data. I'm copying the data from azure blob storage and my query looks something like:

INSERT INTO DB1.TABLE1
SELECT * FROM azureBlobStorage('<bolb storage path>')
SETTINGS
<some insertion settings>

The problem i'm facing is, the python client waits for the insertion to be complete and for very large tables network timeout happens (The call goes through a HAProxy and an Nginx Ingress). For security reasons i cannot increase the timeouts of the gateways.

I tried using async_insert=1, wait_for_async_insert=0 settings in the query, but I noticed it doesn't work with the python clickhouse-client. Is there a way that upon sending an insert query from python client I immediately get the response back and the insertion happens in background at the cluster (as if i'm running a command directly at the cluster using CLI)?

1
  • Consider breaking the large INSERT SELECT into multiple smaller ones as well. This blog series on large data loads might be helpful too - clickhouse.com/blog/… Commented Apr 15 at 5:44

2 Answers 2

1

async_insert=1 will not work for INSERT INTO ... SELECT ...FROM... statements, this setting have another application when you need to insert lot of small concurrent inserts

just increase network send_timeout and receive_timeout in your client

Sign up to request clarification or add additional context in comments.

Comments

0

Since it's an SQL Query where no information needs to go from your machine to the server, you could run it from the server itself, avoiding involving network connections. Just SSH to it with a keepalive option and run it there. If you have access.

If you can't, here's a query that may work

INSERT INTO DB1.TABLE1
SELECT * FROM temp_table
SETTINGS 
    async_insert=1, 
    wait_for_async_insert=0,
    max_insert_threads=8,
    max_threads=8

Other options you may want to set before you even fire off the query

SET max_insert_threads=8;
SET max_threads=8;
SET max_insert_block_size=1048576;

You could also monitor the query from another connection while its going to see how it's turning out

SELECT * FROM system.mutations WHERE is_done=0;
SELECT * FROM system.processes WHERE query LIKE '%INSERT%';

1 Comment

async_insert does not work for INSERT SELECT

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.