8

When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.

Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) ... Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?

gs://df-staging-bucket-57763/ already exists in my project, and I have access to it. What do I need to add to make this work?

2 Answers 2

15

The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging or --tempLocation=gs://df-staging-bucket-57763/temp) to your arguments (for each of the stagingLocation and gcpTempLocation arguments) will be sufficient to run the pipeline.

Sign up to request clarification or add additional context in comments.

Comments

-1

Update run configuration as below:

  1. uncheck flag "Use Default Dataflow options" under the Pipeline Arguments tab. Select pipeline arguments manually.
  2. Keep blank value for "Cloud Storage staging location".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.