0

I have a requirement to process incoming csv files in gcs.

I am not able to trigger execution by shell commands in cloud-functions like:

subprocess.run([
    "python", "-W", "ignore", "dataflow_ingestion_engine.py",
    "gs://logs-check/csv_input/input2.csv", "--runner", "DataflowRunner"]
    )

or

command = """
python tmp/dataflow_ingestion_engine.py gs://logs-check/csv_input/input2.csv --runner DataflowRunner
"""
os.system(command)

Cloud-function is not execution the shell execution part(Not seeing anything in the log). How to achieve the requirement of triggering dataflow(Python-SDK) using cloud-function?

1 Answer 1

2

You cannot invoke dataflow jobs from Cloud Functions using a sub-process command. You will have to make REST or gRPC calls from your cloud function code.

A sample job to run templates can be found here [1].

You can refer to this and submit your own job.

1 - https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/dataflow/run_template

Sign up to request clarification or add additional context in comments.

4 Comments

Can you share some tutorial on creating template?
You can take a look at cloud.google.com/dataflow/docs/guides/templates/…. Also, in the dataflow job page there are multiple templates already available. All those templates can be found here - github.com/GoogleCloudPlatform/DataflowTemplates. If this answer helped please do accept the answer.
Thanks for the answer but the google tutorial for creating templates is not that great.
Thanks for the feedback, in the tutorial at the end there is a link to provide feedback. It would be great if you can provide your feedback so that we can improve upon your suggestion.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.