Apache NiFi - "ExecuteSQL" runs the Query in Parallel?

Question

Apache NiFi provides "ExecuteSQL" processor to execute a query and return the results as flow files. But, if we choose the Execution option as "All Nodes" , does NiFi divides the query in to different batches and executes each of them in parallel (similar to how SQOOP does) ?

Bryan Bende · Accepted Answer · 2019-05-14 13:38:23Z

5

If you use ExecuteSQL and select all nodes, then the same query is run on all nodes.

If you want sqoop like behavior you will want to use processors like GenerateTableFetch on primary node only, then use a load-balanced connection connected to ExecuteSQL so that the fetch queries get distributed across the cluster.

answered May 14, 2019 at 13:38

Bryan Bende

18.7k1 gold badge33 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Akhil Over a year ago

Bryan, would you be able to answer this question also - stackoverflow.com/questions/56126682/…

Bryan Bende Over a year ago

Hard to answer since it depends on lots of factors.. size of DB table, size of NiFi cluster, hardware specs of cluster, etc. In general, sqoop will probably win for large scale performance.

Akhil Over a year ago

The GenerateTableFetch will generate multiple queries based on the size of the table and pass it to subsequent processors as flow files, ExecuteSQL in this case. How can I load balance the ExecuteSQL processor ? Can you give an example ?

Bryan Bende Over a year ago

I mentioned this in the answer, you run GenerateTableFetch on primary node, connect it to ExecuteSQL, and configure load balancing on the connection - blogs.apache.org/nifi/entry/load-balancing-across-the-cluster

Bryan Bende Over a year ago

yes the only processors that should ever be set to 'primary node only' are source processors, so in this case that would be GenerateTableFetch. Also, you can easily run a two node cluster locally to test it out.

|

Collectives™ on Stack Overflow

Apache NiFi - "ExecuteSQL" runs the Query in Parallel?

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related