1

This might seem like a very complex problem to some of you. I want to use Apache Flink to apply some algorithms on data from a SocketStream. However, these algorithms are external executables that I am running using Scala's sys.process package. Here is what I want Flink to do:

  1. Get individual lines from SocketStream:

    val text = env.socketTextStream(hostName, port) val lines = text.flatMap { _.toLowerCase.split("\\n") filter { _.nonEmpty } }

  2. Call my executable algorithm with these lines as command line parameters. Somewhat like this:

    var op = "./Somefile.py "+lines!

  3. Print the output I get from the executable.

    op.print()

Obviously this is not the correct way to do what I am trying to do as op unlike lines is not a data sink and thus nothing is getting printed. Is there some way I can achieve this?

1 Answer 1

2

If you put all arguments into a single String value, you can call the external executable from a MapFunction.

This would look like:

val args: DataStream[String] = env.socketTextStream(hostName, port) 
// assume each text line has all elements
val out: DataStream[String] = args.map(new ExternalCaller())
// print result
out.print()

with

class ExternalCaller extends MapFunction[String, String] {

  override def map(args: String): String = {
    // call external executable with args here and return output
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

I kind of figured that it could be done with map, but this solved my problem. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.