24

So I'm trying to convert a binary to a string. This code:

t = [{<<71,0,69,0,84,0>>}]
String.from_char_list(t)

But I'm getting this when I try this conversion:

** (ArgumentError) argument error
    (stdlib) :unicode.characters_to_binary([{<<70, 0, 73, 0, 78, 0>>}])
    (elixir) lib/string.ex:1161: String.from_char_list/1

I'm assuming the <<70, 0, etc. is likely a list of graphemes (it's the return from an API call and the API is not quite documented) but do I need to specify the encoding somehow?

I know I'm likely missing something obvious (maybe that's not the right function to use?) but I can't seem to figure out what to do here.


EDIT:

For what it's worth, the binary above is the return value of an Erlang ODBC call. After a little more digging I found that the binary in question is actually a "Unicode binary encoded as UTF16 little endian" (see here: http://www.erlang.org/doc/apps/odbc/odbc.pdf pg. 9 re: SQL_WVARCHAR) Doesn't really change the issue but it does add some context.

1
  • 2
    Elixir assumes strings are UTF8 encoded binaries, not UTF16. Commented Mar 21, 2014 at 14:14

7 Answers 7

28

There's a couple of things here:

1.) You have a list with a tuple containing one element, a binary. You can probably just extract the binary and have your string. Passing the current data structure to to_string is not going to work.

2.) The binary you used in your example contains 0, an unprintable character. In the shell, this will not be printed properly as a string, due to the fact that Elixir can't tell the difference between just a binary, and a binary representing a string, when the binary representing a string contains unprintable characters.

3.) You can use pattern matching to convert a binary to a particular type. For instance:

iex> raw = <<71,32,69,32,84,32>>
...> Enum.join(for <<c::utf8 <- raw>>, do: <<c::utf8>>)
"G E T "
...> <<c::utf8, _::binary>> = raw
"G"

Also, if you are getting binary data from a network connection, you probably want to use :erlang.iolist_to_binary, since the data will be an iolist, not a charlist. The difference is that iolists can contain binaries, nested lists, as well as just be a list of integers. Charlists are always just a flat list of integers. If you call to_string, on an iolist, it will fail.

Sign up to request clarification or add additional context in comments.

6 Comments

I thought that the list containing tuple was an issue but I wanted to give the code exactly as it comes back from the API. I'm guessing the result is in a DBCS. Need to dig into it a bit further.
Partially I was hoping there might be something already built into the library that I was missing. Good to know I didn't miss anything.
Yes it does seem like each character is being stored in two bytes, good call! The distinction between binaries and strings and charlists and iolists is confusing at times, but I think there are some changes coming down the pipe that should make it more obvious when to use each one.
Yep, it's actually a little endian UTF16 binary. See my edit to my question.
When I tried <<t::utf8>> = <<71,32,69,32,84,32>>, in elixir console, I got ** (MatchError) no match of right hand side value: "G E T " .. Can anyone explain why?
|
7

I made a function to convert binary to string

def raw_binary_to_string(raw) do
   codepoints = String.codepoints(raw)  
      val = Enum.reduce(codepoints, 
                        fn(w, result) ->  
                            cond do 
                                String.valid?(w) -> 
                                    result <> w 
                                true ->
                                    << parsed :: 8>> = w 
                                    result <>   << parsed :: utf8 >>
                            end
                        end)

  end

Executed on iex console

iex(6)>raw=<<65, 241, 111, 32, 100, 101, 32, 70, 97, 99, 116, 117, 114, 97, 99, 105, 111, 110, 32, 65, 99, 116, 117, 97, 108>>
iex(6)>raw_binary_to_string(raw)
iex(6)>"Año de Facturacion Actual"

Comments

5

Not sure if OP has since solved his problem, but in relation to his remark about his binary being utf16-le: for specifically that encoding, I found that the quickest (and to those more experienced with Elixir, probably-hacky) way was to use Enum.reduce:

# coercing it into utf8 gives us ["D", <<0>>, "e", <<0>>, "v", <<0>>, "a", <<0>>, "s", <<0>>, "t", <<0>>, "a", <<0>>, "t", <<0>>, "o", <<0>>, "r", <<0>>]
<<68, 0, 101, 0, 118, 0, 97, 0, 115, 0, 116, 0, 97, 0, 116, 0, 111, 0, 114, 0>>  
|> String.codepoints()
|> Enum.reduce("", fn(codepoint, result) ->
                     << parsed :: 8>> = codepoint
                     if parsed == 0, do: result, else: result <> <<parsed>>
                   end)

# "Devastator"
|> IO.puts()

Assumptions:

  • utf16-le encoding

  • the codepoints are backwards-compatible with utf8 i.e. they use only 1 byte

Since I'm still learning Elixir, it took me a while to get to this solution. I looked into other libraries people made, even using something like iconv at a bash level.

1 Comment

"backwards-compatible with utf8" is a misleading. "Representable with ASCII" might be more accurate. UTF-8 is backwards-compatible with ASCII (UTF-16 is not), but it also has 2-, 3-, or 4-byte characters. UTF-16 always uses 2 bytes, which is often wasteful. Anyway, I'm an Elixir newb and yeah this is a hack.
4

Ecto.UUID.load/1 will convert a binary to string and return a tuple:

binary = Ecto.UUID.bingenerate()
<<99, 148, 189, 126, 144, 154, 71, 236, 160, 110, 149, 143, 67, 162, 177, 192>>

Ecto.UUID.load(binary)
{:ok, "6394bd7e-909a-47ec-a06e-958f43a2b1c0"}

credit: https://stackoverflow.com/a/43530427/2091331

1 Comment

convert a binary to string and return a tuple. No it won't. It will convert a hex encoded uuid into a UUID string, but it will not convert any given binary to a string. Even if it did, it also inserts hyphens. It's only to be used for UUIDs.
3

The last point definitely does change the issue, and explains it. Elixir uses binaries as strings but assumes and demands that they are UTF8 encoded, not UTF16.

Comments

2

In reference to http://erlang.org/pipermail/erlang-questions/2010-December/054885.html

You can use :unicode.characters_to_list(binary_string, {:utf16, :little}) to verify result and store too

IEX eval

iex(1)> y                                                
<<115, 0, 121, 0, 115, 0>>
iex(2)> :unicode.characters_to_list(y, {:utf16, :little})
'sys'

Note : Value printed as sys for <<115, 0, 121, 0, 115, 0>>

Comments

1

You can use Comprehensions

    defmodule TestModule do
      def convert(binary) do
        for c <- binary, into: "", do: <<c>>
      end
    end
    TestModule.convert([71,32,69,32,84,32]) |> IO.puts

1 Comment

If I pass the original argument <<71,0,69,0,84,0>> in this case, this doesn't work. I get an argument error. If I have to take the additional step of converting the binary to a list, you probably want to specify that in your answer as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.