5

I need to extract data from a binary file.

I used binaryRecords and get RDD[Array[Byte]].

From here I want to parse every record into case class (Field1: Int, Filed2 : Short, Field3: Long)

How can I do this?

2
  • do you have a delimiter in the binary file? Commented Nov 11, 2015 at 15:41
  • No, no delimiter. Regular binary file built by C program with the structure int, short, long Commented Nov 12, 2015 at 7:00

2 Answers 2

4

assuming you have no delimiter, an Int in Scala is 4 bytes, Short is 2 byte and long is 8 bytes. Assume that your Binary data was structured (for each line) as Int Short Long. You should be able to take the bytes and convert them to the classes you want.

import java.nio.ByteBuffer

val result = YourRDD.map(x=>(ByteBuffer.wrap(x.take(4)).getInt,
             ByteBuffer.wrap(x.drop(4).take(2)).getShort,
             ByteBuffer.wrap(x.drop(6)).getLong))

This uses a Java library to convert Bytes to Int/Short/Long, you can use other libraries if you want.

Sign up to request clarification or add additional context in comments.

Comments

1

Since Spark 3.0, Spark has a “binaryFile” data source to read Binary file

I've found this at How to read Binary file into DataFrame with more explanation.

val df = spark.read.format("binaryFile").load("/tmp/binary/spark.png")
  df.printSchema()
  df.show()

This outputs schema and DataFrame as below

root
 |-- path: string (nullable = true)
 |-- modificationTime: timestamp (nullable = true)
 |-- length: long (nullable = true)
 |-- content: binary (nullable = true)

+--------------------+--------------------+------+--------------------+
|                path|    modificationTime|length|             content|
+--------------------+--------------------+------+--------------------+
|file:/C:/tmp/bina...|2020-07-25 10:11:...| 74675|[89 50 4E 47 0D 0...|
+--------------------+--------------------+------+--------------------+

Thanks

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.