17

I want to echo two variables on the same line.
I want to store 2015-03-04.01.Abhi_Ram.txt in a variable FILENAME and 10 in a variable COUNT and echo them simultaneously.

Sample.txt

2015-03-04.01.Abhi_Ram.txt 10
2015-03-04.02.Abhi_Ram.txt 70

Below is the code I came up with:

for line in `hadoop fs -cat sample.txt`
do

VAR="${line}"
FILENAME=`echo ${VAR}|awk '{print $1}'`
COUNT=`echo ${VAR}|awk '{print $2}'`
COUNT_DT=`date "+%Y-%m-%d %H:%M:%S"`
echo db"|"Abhi_Ram"|"record_count"|"${FILENAME}"||"${COUNT}"||"${COUNT_DT} >> output.txt
done

I want the output as:

db|Abhi_Ram|record_count|2015-03-04.01.Abhi_Ram.txt||10||timestamp db|Abhi_Ram|record_count|2015-03-04.02.Abhi_Ram.txt||70||timestamp

I'm getting the output as:

db|Abhi_Ram|record_count|2015-03-04.01.Abhi_Ram.txt||||timestamp
db|Abhi_Ram|record_count|10||||timestamp
db|Abhi_Ram|record_count|2015-03-04.02.Abhi_Ram.txt||||timestamp
db|Abhi_Ram|record_count|70||||timestamp

Could someone point me what I am missing?

2
  • 1
    Why are you calculating the date inside the loop, when it doesn't make use of any variable being set? That's making your loop much slower / more expensive than it would otherwise need to be, if you put the date call outside, to happen only once before the loop starts. Commented Aug 10, 2015 at 23:09
  • 2
    Also, if you're using a new enough bash (4.1 or 4.2), there's a printf builtin for date formatting, making use of the external date command unnecessary. Commented Aug 10, 2015 at 23:13

2 Answers 2

22

Consider:

while read filename count
do
    count_dt=$(date "+%Y-%m-%d %H:%M:%S")
    echo "db|Abhi_Ram|record_count|${filename}||${count}||${count_dt}"
done <sample.txt >>output.txt

This produces the file:

$ cat output.txt 
db|Abhi_Ram|record_count|2015-03-04.01.Abhi_Ram.json||10||2015-08-10 14:42:39
db|Abhi_Ram|record_count|2015-03-04.02.Abhi_Ram.json||70||2015-08-10 14:42:39

Notes:

  1. It is best practice to use lower or mixed case for your shell variables. The system uses upper case variables and you don't want to accidentally overwrite one.

  2. The many double-quotes in the echo statement were unnecessary. The whole of the output string can be inside one double-quoted string.

  3. If you want to read a file one line at a time, it is safer to use the while read ... done <inputfile construct. The read statement also allows us to easily define the filename and count variables.

  4. For command substitution, many prefer the form $(...) over the backtick form. This is because (a) the $(...) makes the beginning and end of the command substitution visually distinct, (b) the $(...) form nests well, and (c) not all fonts clearly show backticks as different from regular ticks. (Thanks Chepner.)

  5. For efficiency, the redirection to output.txt has been moved to the end of the loop. In this way, the file is only opened and closed once. (Thanks Charles Duffy.)

  6. Unless you need count_dt updated with each individual entry, it could be placed before the loop and set just once everytime sample.txt was processed. If you have an up-to-date version of bash (no Mac OSX), then the count_dt assignment can be replaced (Thanks Charles Duffy) with a native bash statement (no shelling out required):

    printf -v count_dt '%(%Y-%m-%d %H:%M:%S)T'
    
Sign up to request clarification or add additional context in comments.

6 Comments

I'd suggest making it read -r, to avoid dropping any backslash literals. (Granted, they're not likely in filenames, but they are possible).
I'd also suggest putting a >output.txt on the outside of the loop, so you only open output.txt once, rather than re-opening the file every time you run echo, as done with the >>output.txt on the echo command.
@CharlesDuffy Good idea. Answer updated to move the redirection outside the loop.
(Would also greatly improve efficiency to calculate count_dt outside the loop... or use the shiny new recent-4.x feature printf -v count_dt '%(%Y-%m-%d %H:%M:%S)T' -1 to do the calculation in native bash rather than shelling out to an external date command for every row processed).
@CharlesDuffy Nice! That printf is new to me.
|
3

John1024 has explained how to do this correctly; I'd like to take a look at why the original version didn't work. The basic problem is that for loops over words, not over lines. The file has two words on each line (a filename and a count), so it runs the loop twice per line. To see this, try:

for line in `hadoop fs -cat sample.txt`
do
    echo "$line"
done

...and it'll print something like:

2015-03-04.01.Abhi_Ram.txt
10
2015-03-04.02.Abhi_Ram.txt
70

...which isn't what you want at all. It also has some other unpleasant quirks, like if the input file contained the word "*", it'd insert a list of filenames in the current directory.

The while read ... done <file approach is the right way to iterate over lines in a shell script. It just happens to also be able to split each line into fields without having to mess with awk (in this case, read filename count does it).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.