5

I have a text file with one sentence per line. I would like to lemmatize the worlds in each line using hunspell (-s option). Since I want to have the lemmas of each line separately, it wouldn't make sense to submit the whole text file to hunspell. I do need to send one line after another and have the hunspell output for each line.

Following the answers from How to process input and output streams in Steel Bank Common Lisp?, I was able to send the whole text file for hunspell one line after another but I was not able to capture the output of hunspell for each line. How interact with the process sending the line and reading the output before send another line?

My current code to read the whole text file is

(defun parse-spell-sb (file-in)
  (with-open-file (in file-in)
    (let ((p (sb-ext:run-program "/opt/local/bin/hunspell" (list "-i" "UTF-8" "-s" "-d" "pt_BR") 
                 :input in :output :stream :wait nil)))
      (when p
        (unwind-protect 
          (with-open-stream (o (process-output p)) 
            (loop 
         :for line := (read-line o nil nil) 
         :while line 
         :collect line)) 
          (process-close p))))))

Once more, this code give me the output of hunspell for the whole text file. I would like to have the output of hunspell for each input line separately.

Any idea?

1
  • @wvxvw sure! But hunspell can be used interactively in the prompt. If I start it with "hunspell -s". That is why I supposed that I can make it working interactively with CL. Sure the best way should be common-lisp.net/project/cffi but I still have to learn how to work with it. Commented Apr 13, 2013 at 16:54

1 Answer 1

7

I suppose you have a buffering problem with the program you want to run. For example:

(defun program-stream (program &optional args)
  (let ((process (sb-ext:run-program program args
                                     :input :stream
                                     :output :stream
                                     :wait nil
                                     :search t)))
    (when process
      (make-two-way-stream (sb-ext:process-output process)
                           (sb-ext:process-input process)))))

Now, on my system, this will work with cat:

CL-USER> (defparameter *stream* (program-stream "cat"))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)       ; will hang without this
NIL
CL-USER> (read-line *stream*)
"foo bar baz"
NIL
CL-USER> (close *stream*)
T

Notice the finish-output – without this, the read will hang. (There's also force-output.)

Python in interactive mode will work, too:

CL-USER> (defparameter *stream* (program-stream "python" '("-i")))
*STREAM*
CL-USER> (loop while (read-char-no-hang *stream*)) ; skip startup message
NIL
CL-USER> (format *stream* "1+2~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)
"3"
NIL
CL-USER> (close *stream*)
T

But if you try this without the -i option (or similar options like -u), you'll probably be out of luck, because of the buffering going on. For example, on my system, reading from tr will hang:

CL-USER> (defparameter *stream* (program-stream "tr" '("a-z" "A-Z")))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)          ; hangs
; Evaluation aborted on NIL.
CL-USER> (read-char-no-hang *stream*)
NIL
CL-USER> (close *stream*)
T

Since tr doesn't provide a switch to turn off buffering, we'll wrap the call with a pty wrapper (in this case unbuffer from expect):

CL-USER> (defparameter *stream* (program-stream "unbuffer"
                                                '("-p" "tr" "a-z" "A-Z")))
*STREAM*
CL-USER> (format *stream* "foo bar baz~%")
NIL
CL-USER> (finish-output *stream*)
NIL
CL-USER> (read-line *stream*)
"FOO BAR BAZ
"
NIL
CL-USER> (close *stream*)
T

So, long story short: Try using finish-output on the stream before reading. If that doesn't work, check for command line options preventing buffering. If it still doesn't work, you could try wrapping the programm in some kind of pty-wrapper.

Sign up to request clarification or add additional context in comments.

3 Comments

Re-reading my answer, maybe I should clarify that finish-output is used to make sure your input actually gets sent despite buffering – it has nothing to do with the output of the program per se. Also, the processes won't get closed when closing the two-way-streams (but that's not a problem for demonstration purposes).
Thanks, danlei. Your answer is very valuable to me.
@xiepan Glad it helped!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.