1

We need to establish a filecopy at HDFS location, between HDFS folders. We currently have used curl command, as shown below, in a shell script loop.

/usr/bin/curl -v --negotiate -u : -X PUT "<hnode>:<port>/webhdfs/v1/busy/rg/stg/"$1"/"$table"/"$table"_"$3".dsv?op=RENAME&destination=/busy/rg/data/"$1"/"$table"/"$table"_$date1.dsv"

However this achieves the file move. We need to establish a filecopy, such that file is maintained at the original staging location.

I Wanted to know if there is a corresponding curl operation? op=RENAME&destination instead of Rename, what else could work?

3
  • Is there some reason you can't use hdfs dfs -cp? Commented Jul 5, 2017 at 4:01
  • Thanks for replying. Yes this works when you are within the hdfs box. But my requirement is to connect from an external unix box, automated kerberos login into hdfs and then move the files within hdfs, hence the curl. Commented Jul 5, 2017 at 13:04
  • Just a tip, elinks (in my experience) will use your kerberos ticket. Commented Jul 5, 2017 at 13:43

2 Answers 2

0

WebHDFS alone does not offer a copy operation in its interface. The WebHDFS interface provides lower-level file system primitives. A copy operation is a higher-level application that uses those primitive operations to accomplish its work.

The implementation of hdfs dfs -cp against a webhdfs: URL essentially combines op=OPEN and op=CREATE calls to complete the copy. You could potentially re-implement a subset of that logic in your script. If you want to pursue that direction, the CopyCommands class is a good starting point in the Apache Hadoop codebase for seeing how that works.

Here is a starting point for how this could work. There is an existing file at /hello1 that we want to copy to /hello2. This script calls curl to open /hello1 and pipes the output to another curl command, which creates /hello2, using stdin as the input source.

> hdfs dfs -ls /hello*
-rw-r--r--   3 cnauroth supergroup          6 2017-07-06 09:15 /hello1

> curl -sS -L 'http://localhost:9870/webhdfs/v1/hello1?op=OPEN' |
>     curl -sS -L -X PUT -d @- 'http://localhost:9870/webhdfs/v1/hello2?op=CREATE&user.name=cnauroth'

> hdfs dfs -ls /hello*
-rw-r--r--   3 cnauroth supergroup          6 2017-07-06 09:15 /hello1
-rw-r--r--   3 cnauroth supergroup          5 2017-07-06 09:20 /hello2

But my requirement is to connect from an external unix box, automated kerberos login into hdfs and then move the files within hdfs, hence the curl.

Another option could be a client-only Hadoop installation on your external host. You would have an installation of the Hadoop software and the same configuration files from the Hadoop cluster, and then you could issue the hdfs dfs -cp commands instead of running curl commands against HDFS.

Sign up to request clarification or add additional context in comments.

5 Comments

Yes, that's how it turned out.. But getting "URL not specified" error: Please bear formatting.. Code in next comment
/usr/bin/curl -L --negotiate -u : -X GET "hdnode:hdport/webhdfs/v1/bus/rg/file1.dsv?op=OPEN" -o /etrade/home/suser/file11.dsv /usr/bin/curl -i -s --negotiate -u : –X PUT "hdnode:hdport/webhdfs/v1/bus/rg/data/file11.dsv?op=CREATE" | grep Location: /usr/bin/curl -i -s --negotiate -u : -X PUT -T /etrade/home/suser/file11.dsv $Location
@SwathiR , I have edited my answer to show a starting point for how to pipe the OPEN call into the CREATE call. This is working correctly. My example is using simple auth (no Kerberos), but making it work in a Kerberos secured cluster should involve just adding the SPNEGO options to the curl call, and I see you're already familiar with those. I hope this helps.
Hello. Thanks for your inputs. Yes,this approach worked.
@SwathiR , I'm glad to hear this worked! If this was helpful, then please consider accepting the answer. stackoverflow.com/help/someone-answers
0

I don't know what distribution you use, if you use Cloudera, try using BDR (Backup, Data recovery module) using REST APIs.

I used it to copy the files/folders within hadoop cluster and across hadoop clusters, it works against encrypted zones(TDE) as well

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.