Skip to main content
97 events
when toggle format what by license comment
Mar 14, 2024 at 15:48 history unprotected casperOne
Sep 26, 2022 at 23:26 answer added Michel Diemer timeline score: 3
Jun 1, 2022 at 18:16 comment added Andrew Morton Where did the string come from? It might be possible to read bytes from the original source instead of going via a string.
Apr 12, 2022 at 18:16 comment added Karl Stephen Also, why should encoding even be taken into consideration? Because the bytes you get through your program are bytes produced by a default encoding, likely UTF16 LittleEndian on a .Net Windows platform. The day the system environment changes your data will likely become USELESS GARBAGE ! You just want to write binary files for your own use through your program on a computer that would stop to get updates at some point, it's okay. But don't come to others under different architecture and/or other endianness without specifying the encoding you used to produce the bytes.
Oct 3, 2020 at 10:27 review Close votes
Oct 7, 2020 at 0:01
Sep 7, 2020 at 1:26 review Close votes
Sep 11, 2020 at 0:03
Aug 3, 2020 at 20:44 answer added Chris Hutchinson timeline score: 2
Feb 26, 2020 at 22:22 history edited John Smith CC BY-SA 4.0
added 5 characters in body
Sep 11, 2019 at 4:21 answer added jpmc26 timeline score: 3
S Oct 1, 2018 at 12:36 history suggested Dragonthoughts
This relates strongly to character encoding
Oct 1, 2018 at 11:23 review Suggested edits
S Oct 1, 2018 at 12:36
Jul 2, 2018 at 20:51 answer added Jason Goemaat timeline score: 8
Jun 27, 2018 at 11:21 comment added Thanasis Ioannidis You should always worry about what encoding your string is represented in the byte array. The assumption that the string is represented in-memory with a byte array is arbitrary. It happens to be like that in the present implementation of .net. No one can guarantee you it won't change to a linked-list implementation in the future (or any other exotic data structure). Even if you use the same system and the same program to read back the encrypted data, there is always a chance a future patch of .net will break everything apart because you didn't explicity specify in what Encoding you work
Jun 27, 2018 at 11:16 comment added Thanasis Ioannidis Not worrying about encoding is one thing. Not wanting to specify an encoding is an entirely another thing. If what brings you frustration is what encoding you should use, just pick one and use it all the times for conversions between string to byte array and byte array to string. For instance, always use Unicode, or UTF-8. Your choice. After you have chosen an Encoding, you need not to worry any more and your problem is solved. But if your frustration comes from the need to specify an encoding then you better get used to it, because either you like it or not, an encoding is taking place.
Jan 10, 2018 at 20:21 answer added John Rasch timeline score: 17
S Dec 18, 2017 at 19:05 history edited Servy CC BY-SA 3.0
deleted 38 characters in body
Dec 18, 2017 at 17:41 review Suggested edits
S Dec 18, 2017 at 19:05
Dec 5, 2017 at 16:23 comment added mg30rg Encoding is necessary because the size - in bytes - of the represented characters depends on it, and not only because sizeof(char) is different for i.e. ASCII (1 byte) and WideString(2 bytes), but because it can even vary - in case of UTF-8 a character is represented as 1 to 4 bytes
Nov 8, 2017 at 18:21 answer added NH. timeline score: 2
Oct 2, 2017 at 16:32 review Close votes
Oct 6, 2017 at 0:05
Jul 24, 2017 at 9:36 comment added Jeppe Stig Nielsen Your first comment (quote): Every string is stored as an array of bytes right? Why can't I simply have those bytes? No, every string is (more or less) stored as an array of 16-bit code units which correspond to UTF-16. There will be surrogate pairs in there if your string contains Unicode characters outside plane 0. You can get that representation easily: var array1 = yourString.ToCharArray(); If for some reason you want the code units as UInt16 values, do var array2 = Array.ConvertAll<char, ushort>(array1, x => x);. That is a ushort[] there.
Apr 28, 2017 at 13:59 comment added Kris Vandermotten Are you assuming that System.Text.Encoding.Unicode.GetBytes(); is doing some kind of expensive conversion that you want to avoid? If so, your assumption is wrong.
Apr 20, 2017 at 8:36 comment added Ark-kun @AgnelKurian "He wants me to take care of writing and reading those numbers. I am not interpreting them." - If you weren't interpreting them, you'd have bytes and not "numbers". Then, your question disappears. If you have "numbers", that means you've already interpreted/decoded them and threw away the original byte data. And now you want to try and reconstruct the data (encode) which might not be even possible. What it the numbers were actually base-10 and by cramming them into base-2 floats, you've destroyed them forever? Don't want to encode? Don't decode then. Want bytes? Then use bytes.
Jan 9, 2017 at 1:15 history edited Peter Mortensen CC BY-SA 3.0
Copy edited.
Aug 30, 2016 at 10:21 review Suggested edits
Aug 30, 2016 at 11:34
Mar 5, 2016 at 15:00 history edited justhalf CC BY-SA 3.0
Reword (with slight change in meaning) to make it more accurate in describing OP's use case, which is very specific (not string-to-byte conversion in general use case). Include comments from OP into the question to make the use case, which is very specific, clearer.
Feb 11, 2016 at 19:32 answer added Mojtaba Rezaeian timeline score: 0
Jan 21, 2016 at 17:19 answer added IgnusFast timeline score: -5
Aug 18, 2015 at 17:04 answer added Gerard ONeill timeline score: 8
Jun 30, 2015 at 14:39 answer added alireza amini timeline score: 1
Apr 24, 2015 at 9:47 history edited Peter Mortensen CC BY-SA 3.0
Copy edited. Removed historical information (e.g. ref. <http://meta.stackexchange.com/a/230693> and <http://meta.stackoverflow.com/questions/266164>).
Jan 21, 2015 at 14:05 answer added Piero Alberto timeline score: -1
Dec 17, 2014 at 21:23 comment added Greg D @AgnelKurian: Are you trolling me? That question doesn't make sense. I could infer that you meant something like, "...store information about the encoding that was used 1000 times for 1000 different string." Nobody ever said anything about doing that, though, and it was explicitly denied earlier when I stated "The encoding of that string is an implicit part of the serialized contract..." so you couldn't have meant that.
Dec 17, 2014 at 2:42 comment added Agnel Kurian @GregD so you want to store the same encoding 1000 times for 1000 different strings?
Dec 15, 2014 at 18:28 comment added Greg D @Agnel Kurian: If you're writing arbitrary binary data, write binary data. That has nothing to do with the original question (which is fundamentally about serializing a string).
Dec 13, 2014 at 3:36 comment added Agnel Kurian @Greg D, Let's say my client has some floating point numbers in some exotic format used to store astronomical distances. He uses just that one format. He wants me to take care of writing and reading those numbers. I am not interpreting them. My client interprets the numbers and all he needs to give me are the bytes I need to write. When reading, all he needs from me are the bytes I have written. Storing a format flag each time in addition to the bytes is a waste of space when he is using just one format for all numbers.
Dec 12, 2014 at 22:44 comment added Greg D Four years later, I stand by my original comment on this question. It's fundamentally flawed because the fact that we're talking about a string implies interpretation. The encoding of that string is an implicit part of the serialized contract, otherwise it's just a bunch of meaningless bits. If you want meaningless bits, why generate them from a string at all? Just write a bunch of 0's and be done with it.
Nov 25, 2014 at 10:29 answer added Jodrell timeline score: 4
Nov 3, 2014 at 21:50 comment added usr @Mehrdad the existing answers were already invalid (not what was asked). Yours is pretty much the only answer that actually answers just what was asked. (I recommend, though, that you edit your answer to include a few warnings that this approach is really almost never the best one.)
Nov 3, 2014 at 21:37 comment added user541686 @usr: you just invalidated almost all the answers with your edit, and also made it harder for people to find this question with their natural search query (but you probably did that intentionally).
Nov 3, 2014 at 20:18 history edited usr CC BY-SA 3.0
Edited the title to make it more obvious what approach is being asked here (the wrong one!)
Sep 9, 2014 at 11:30 answer added Jarvis Stark timeline score: 17
Aug 28, 2014 at 16:14 answer added George timeline score: 0
Aug 28, 2014 at 15:43 comment added George A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes.
Jun 11, 2014 at 11:29 answer added Vijay Singh Rana timeline score: 2
Apr 9, 2014 at 12:39 answer added WonderWorker timeline score: -1
S Mar 18, 2014 at 9:43 history suggested Newbee CC BY-SA 3.0
removing tag from title
Mar 18, 2014 at 9:42 review Suggested edits
S Mar 18, 2014 at 9:43
Dec 2, 2013 at 4:43 answer added Tom Blodget timeline score: 105
Oct 22, 2013 at 12:55 answer added mashet timeline score: 10
Sep 27, 2013 at 23:26 answer added Thomas Eding timeline score: -12
Sep 2, 2013 at 11:21 answer added Shyam sundar shah timeline score: 6
Aug 5, 2013 at 22:04 comment added Travis Watson @AgnelKurian, A char is a struct that just happens to currently store values as a 16-bit number (UTF-16). What you're really asking (get the character bytes) isn't theoretically possible because it doesn't theoretically exist. A char or string has no Encoding by definition. What if the memory representation changed to UTF-32? Your "get the bytes, shove them back" would fail due to Encoding because you avoided Encoding. So "Why this dependency on encoding?!!!" Depend on Encoding so your code is dependable.
Jul 6, 2013 at 12:06 review Close votes
Jul 6, 2013 at 17:14
Jul 6, 2013 at 11:47 comment added adamjcooper possible duplicate of How do you convert a string to a byte array in .Net
Jun 27, 2013 at 19:25 history protected Paŭlo Ebermann
Jun 12, 2013 at 3:34 review Suggested edits
Jun 12, 2013 at 3:37
Jun 5, 2013 at 10:52 answer added Shyam sundar shah timeline score: 23
Jan 23, 2013 at 6:21 answer added sagardhavale timeline score: -4
Jan 15, 2013 at 11:43 answer added Tommaso Belluzzo timeline score: 3
Oct 12, 2012 at 6:43 history rollback Agnel Kurian
Rollback to Revision 4
Oct 11, 2012 at 17:47 history edited artbristol CC BY-SA 3.0
Question is highly misleading in its current form. Added detail from OP's comments to clarify.
Oct 11, 2012 at 9:45 answer added Avlin timeline score: 1
Apr 30, 2012 at 12:50 answer added Michael Buen timeline score: 46
Apr 30, 2012 at 8:45 vote accept Agnel Kurian
Apr 30, 2012 at 7:44 answer added user541686 timeline score: 1948
Apr 30, 2012 at 7:26 answer added Erik A. Brandstadmoen timeline score: 304
Jan 2, 2012 at 11:07 answer added user1120193 timeline score: 1
Jul 25, 2011 at 22:52 answer added Nathan timeline score: 42
Mar 10, 2011 at 8:57 answer added Gman timeline score: 26
Mar 22, 2010 at 8:40 answer added Alessandro Annini timeline score: 9
Dec 1, 2009 at 19:47 comment added Greg To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use.
Jul 22, 2009 at 11:30 comment added Alexey Romanov In case of .NET, the easy route is using UTF-16 on both sides, since that's what .NET uses internally.
Jul 16, 2009 at 11:45 answer added Konamiman timeline score: 25
Apr 13, 2009 at 14:14 comment added Lucas Jones You can take the easy route and just use UTF-8 on both sides.
Apr 13, 2009 at 14:13 comment added Lucas Jones The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case).
Mar 4, 2009 at 5:51 comment added Agnel Kurian "A string is an array of chars, where a char is not a byte in the .Net world" Alright, but regardless of the encoding, each character maps to one or more bytes. Can I have those bytes please without having to specify an encoding?
Feb 19, 2009 at 21:03 answer added harmonik timeline score: 1
Jan 30, 2009 at 11:02 vote accept Agnel Kurian
Apr 30, 2012 at 8:45
Jan 23, 2009 at 16:38 comment added Greg D I think Anthony is trying to address the fundamental disconnect in <300 chars. You're assuming some consistent internal representation of a string, when in fact that representation could be anything. To create, and eventually decode, the bytestream, you must choose an encoding to use.
Jan 23, 2009 at 16:36 answer added Michael Buen timeline score: 120
Jan 23, 2009 at 15:54 answer added Joel Coehoorn timeline score: 53
Jan 23, 2009 at 14:34 answer added Ed Marty timeline score: 14
Jan 23, 2009 at 14:19 history edited Dale Ragan
Added c# tag.
Jan 23, 2009 at 14:15 answer added Hans Passant timeline score: 11
Jan 23, 2009 at 14:15 comment added Igal Tabachnik Have a look at Jon Skeet's answer in a post with the exact question. It will explain why you depend on encoding.
Jan 23, 2009 at 14:05 comment added Agnel Kurian Every string is stored as an array of bytes right? Why can't I simply have those bytes?
Jan 23, 2009 at 14:03 answer added Zhaph - Ben Duguid timeline score: 100
Jan 23, 2009 at 14:00 comment added Greg D If you're encrypting it, then you'll still have to know what the encoding is after you decrypt it so that you know how to reinterpret those bytes back into a string.
Jan 23, 2009 at 13:57 comment added Agnel Kurian I'm going to encrypt it. I can encrypt it without converting but I'd still like to know why encoding comes to play here. Just give me the bytes is what I say.
Jan 23, 2009 at 13:56 comment added Greg D Your confusion over the role of encoding makes me wonder if this is the right question. Why are you trying to convert a string to a byte array? What are you going to do with the byte array?
Jan 23, 2009 at 13:51 history edited kemiller2002 CC BY-SA 2.5
edited title
Jan 23, 2009 at 13:49 history edited Agnel Kurian CC BY-SA 2.5
why encoding
Jan 23, 2009 at 13:43 answer added cyberbobcat timeline score: -3
Jan 23, 2009 at 13:43 answer added bmotmans timeline score: 1143
Jan 23, 2009 at 13:43 answer added gkrogers timeline score: 20
Jan 23, 2009 at 13:39 history asked Agnel Kurian CC BY-SA 2.5