2

I get stuck by this problem: I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:

./script < filename

I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:

script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...

it only reads first line of the input file. And I find a way to write like this:

Min=-1
while read line; do
    n=$(echo $line | awk -F$delim '{print NF}')   
    if [ $Min -eq -1 ] || [ $n -lt $Min ];then
    Min=$n
    fi
done

it would take very very long time to wait for processing, it seems awk takes much time. So how to improve this?

5 Answers 5

2

/dev/stdin can be quite useful here. In fact, it's just a chain of links to your input.

So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.

Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:

r(){
    l=`readlink $1`
    if [ $? -ne 0 ]
    then
        echo $1
    else
        r $l
    fi
}
filename=`r /dev/stdin`
echo $filename

UPD: in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.

UPD2:tests (test.sh is code above):

$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh 
pipe:[91219]
$ readlink -f /dev/stdin < input 
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
Sign up to request clarification or add additional context in comments.

7 Comments

Hi, because file is read from stdin just like : ./script < filename, thus there is no $1 you can access in script file, that's what makes me annoying...
$1 here is the argument to the r function, not a script argument. /dev/stdin is the argument passed to r, which by recursive calls to r eventually leads you to the name of the file which was redirected to standard input.
I tried, but echo $filename just print fd/0, when execute to awk '...' < "$filename", it throw "fd/0: No such file or directory"
try to run like this: ./test.sh < input
I updated answer with run examples. What is exact command you running? What is OS?
|
2

You're overdoing this. The way you invoke your script:

  • the file contents are the script's standard input
  • the script receives no argument

But awk already takes input from stdin by default, so all you need to do to make this work is:

  • not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
  • not consume any of that input before the wrapping script reaches the awk part. Specifically: no read

If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.

Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.

4 Comments

Thank you! But it seems you misunderstand my question, I need to write a script that takes a file from stdin as argument and then do awk operation in this file it self. I only know read line.
@Alex Oh, ok, I (think I) get it now. But then as I read it, it's your question's preconceptions that are flawed. If you call your script as ./script < filename, it's not going to transmit the file name to the script, only the file contents. Which makes it surprising that the awk invocation inside the script gets to see the first line of the file at all: in optimal conditions all it can see is the (full) contents of a file whose name is in the first line of the input file, instead of the firect first line of contents of the input file. You ought to clear that bit up.
Oh I see, it's my fault. But do you know how to catch the file read from stdin? If ./Script filename, then I just need to use file=$1 so as to give awk as argument. I need to use awk inside script.
If all your process is that awk program, it already takes stdin by default. All you need to do is 1) not specify any input and 2) not consume it elsewhere in the wrapping script (e.g. with read). I'll edit my answer.
0

Modify your script to that it takes the input file name as an argument, then read from the file in your script:

$ ./script filename

In script:

filename=$1
awk '...' < "$filename"

If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.

1 Comment

actually I need to implement two ways to get argument, your version is ok and I already implemented, so I just wonder how to use stdin.
0

How about invoking the script differently pipe standard output of YourFilename into your scriptName as follows (the standard output of the cat filename now becomes standard input to you script, actually in this case to the awk command For I have filename Names.data and script showNames.sh execute as follows

cat Names.data | ./showNames.sh

Contents of filename Names.data Huckleberry Finn Jack Spratt Humpty Dumpty

Contents of scrip;t showNames.sh

#!/bin/bash
#whatever awk commands you need
awk  "{ print }"

Comments

-2

Well I finally find this way to solve my problem, although it will take several seconds.

grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)

Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.