1

I have a byte array (or more precisely a ByteString) of UTF8 strings, which are prefixed by their length as 2-bytes (msb, lsb). For example:

val z = akka.util.ByteString(0, 3, 'A', 'B', 'C', 0, 5, 
        'D', 'E', 'F', 'G', 'H',0,1,'I')

I would like to convert this to a list of strings, so it should similar to List("ABC", "DEFGH", "I").

Is there an elegant way to do this?

(EDIT) These strings are NOT null terminated, the 0 you are seeing in the array is just the MSB. If the strings were long enough, the MSB would be greater than zero.

3 Answers 3

1

Edit: Updated based on clarification in comments that first 2 bytes define an int. So I converted it manually.

def convert(bs: List[Byte]) : List[String] = {
  bs match {
    case count_b1 :: count_b2 :: t =>
      val count =  ((count_b1 & 0xff) << 8) | (count_b2 & 0xff)
      val (chars, leftover) = t.splitAt(count)
      new String(chars.toArray, "UTF-8") :: convert(leftover)
    case _ => List()
  }
}

Call convert(z.toList)

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you. These strings are not null terminated, sorry for the confusion.
Would it be faster to calculate length via bitwise operations? In my code I do something like count = (buff(1) & 0xff) | ((buff(0) & 0xff) << 8)
Yes. I kind of preferred the ByteBuffer as to me its more readable what the intention is but I guess if you wrap it in a function like bytes2int it would have the same effect.
Cool. I also built my own solution using a recursive tail function (as hinted by Enzyme's link below) and now I need to figure out a good way to benchmark and see which is best. Since i'm processing network packets, I am performance sensitive.
Just for reference, 10000 iterations of the allocate version took 180ms, and the bite-manipulation version took 61ms. Now I am going to try and fix some of the other versions to try them out
|
0

Consider multiSpan method as defined here which is a repeated application of span over a given list,

z.multiSpan(_ == 0).map( _.drop(2).map(_.toChar).mkString )

Here the spanning condition is whether an item equals 0, then we drop the first two prefixing bytes, and convert the remaining to a String.

Note On using multiSpan, recall to import annotation.tailrec .

1 Comment

Thank you. As discussed below with Suztomo in another answer, it doesn't account for the MSB/LSB prefix. These strings are not null terminated. I just happen to have the MSB as 0 so I don't have to post an array with a huge string.
0

Here is my answer with foldLeft.

def convert(z : ByteString) = z.foldLeft((List() : List[String], ByteString(), 0, 0))((p, b : Byte) => {
  p._3 match {
    case 0 if p._2.nonEmpty => (p._2.utf8String :: p._1, ByteString(), -1, b.toInt)
    case 0 => (p._1, p._2, -1, b.toInt)
    case -1 => (p._1, p._2, (p._4 << 8) + b.toInt, 0)
    case _ => (p._1, p._2 :+ b, p._3 - 1, 0)
  }
})

It works like this:

scala> val bs = ByteString(0, 3, 'A', 'B', 'C', 0, 5,  'D', 'E', 'F', 'G', 'H',0,1,'I')
scala>   val k = convert(bs); (k._2.utf8String :: k._1).reverse
k: (List[String], akka.util.ByteString, Int, Int) = (List(DEFGH, ABC),ByteString(73),0,0)
res20: List[String] = List(ABC, DEFGH, I)

4 Comments

Thank you. I will give this a try. Why do you think the strings are null terminated? The 0 in the array is just the MSB of the length. Unless I'm missing something.
Now I got it. The length part consists of 2 bytes! Then my solution won't work. I see others also assumed the zeros in array are separator.
Sorry about that, I added a small clarification in the question for future reference.
Thanks. It seems like this version scales poorly. In my benchmarks, it's fairly equivalent to the version by Lionel below at small iterations up to about 10,000 (66ms vs 68ms). When I push it to 100,000 it becomes more than 2x slower (180ms vs 340ms). It kinda tapers off (at 1m, I get 1191ms vs 2943ms).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.