0

Apache NiFi provides "ExecuteSQL" processor to execute a query and return the results as flow files. But, if we choose the Execution option as "All Nodes" , does NiFi divides the query in to different batches and executes each of them in parallel (similar to how SQOOP does) ?

1 Answer 1

5

If you use ExecuteSQL and select all nodes, then the same query is run on all nodes.

If you want sqoop like behavior you will want to use processors like GenerateTableFetch on primary node only, then use a load-balanced connection connected to ExecuteSQL so that the fetch queries get distributed across the cluster.

Sign up to request clarification or add additional context in comments.

10 Comments

Bryan, would you be able to answer this question also - stackoverflow.com/questions/56126682/…
Hard to answer since it depends on lots of factors.. size of DB table, size of NiFi cluster, hardware specs of cluster, etc. In general, sqoop will probably win for large scale performance.
The GenerateTableFetch will generate multiple queries based on the size of the table and pass it to subsequent processors as flow files, ExecuteSQL in this case. How can I load balance the ExecuteSQL processor ? Can you give an example ?
I mentioned this in the answer, you run GenerateTableFetch on primary node, connect it to ExecuteSQL, and configure load balancing on the connection - blogs.apache.org/nifi/entry/load-balancing-across-the-cluster
yes the only processors that should ever be set to 'primary node only' are source processors, so in this case that would be GenerateTableFetch. Also, you can easily run a two node cluster locally to test it out.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.