Timeline for How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

Current License: CC BY-SA 4.0

97 events

when toggle format	what		by	license	comment
Mar 14, 2024 at 15:48	history	unprotected	casperOne
Sep 26, 2022 at 23:26	answer	added	Michel Diemer		timeline score: 3
Jun 1, 2022 at 18:16	comment	added	Andrew Morton		Where did the string come from? It might be possible to read bytes from the original source instead of going via a string.
Apr 12, 2022 at 18:16	comment	added	Karl Stephen		Also, why should encoding even be taken into consideration? Because the bytes you get through your program are bytes produced by a default encoding, likely UTF16 LittleEndian on a .Net Windows platform. The day the system environment changes your data will likely become USELESS GARBAGE ! You just want to write binary files for your own use through your program on a computer that would stop to get updates at some point, it's okay. But don't come to others under different architecture and/or other endianness without specifying the encoding you used to produce the bytes.
Oct 3, 2020 at 10:27	review	Close votes
Oct 7, 2020 at 0:01
Sep 7, 2020 at 1:26	review	Close votes
Sep 11, 2020 at 0:03
Aug 3, 2020 at 20:44	answer	added	Chris Hutchinson		timeline score: 2
Feb 26, 2020 at 22:22	history	edited	John Smith	CC BY-SA 4.0	added 5 characters in body
Sep 11, 2019 at 4:21	answer	added	jpmc26		timeline score: 3
S Oct 1, 2018 at 12:36	history	suggested	Dragonthoughts		This relates strongly to character encoding
Oct 1, 2018 at 11:23	review	Suggested edits
S Oct 1, 2018 at 12:36
Jul 2, 2018 at 20:51	answer	added	Jason Goemaat		timeline score: 8
Jun 27, 2018 at 11:21	comment	added	Thanasis Ioannidis		You should always worry about what encoding your string is represented in the byte array. The assumption that the string is represented in-memory with a byte array is arbitrary. It happens to be like that in the present implementation of .net. No one can guarantee you it won't change to a linked-list implementation in the future (or any other exotic data structure). Even if you use the same system and the same program to read back the encrypted data, there is always a chance a future patch of .net will break everything apart because you didn't explicity specify in what Encoding you work
Jun 27, 2018 at 11:16	comment	added	Thanasis Ioannidis		Not worrying about encoding is one thing. Not wanting to specify an encoding is an entirely another thing. If what brings you frustration is what encoding you should use, just pick one and use it all the times for conversions between string to byte array and byte array to string. For instance, always use Unicode, or UTF-8. Your choice. After you have chosen an Encoding, you need not to worry any more and your problem is solved. But if your frustration comes from the need to specify an encoding then you better get used to it, because either you like it or not, an encoding is taking place.
Jan 10, 2018 at 20:21	answer	added	John Rasch		timeline score: 17
S Dec 18, 2017 at 19:05	history	edited	Servy	CC BY-SA 3.0	deleted 38 characters in body
Dec 18, 2017 at 17:41	review	Suggested edits
S Dec 18, 2017 at 19:05
Dec 5, 2017 at 16:23	comment	added	mg30rg		Encoding is necessary because the size - in bytes - of the represented characters depends on it, and not only because sizeof(char) is different for i.e. ASCII (1 byte) and WideString(2 bytes), but because it can even vary - in case of UTF-8 a character is represented as 1 to 4 bytes
Nov 8, 2017 at 18:21	answer	added	NH.		timeline score: 2
Oct 2, 2017 at 16:32	review	Close votes
Oct 6, 2017 at 0:05
Jul 24, 2017 at 9:36	comment	added	Jeppe Stig Nielsen		Your first comment (quote): Every string is stored as an array of bytes right? Why can't I simply have those bytes? No, every string is (more or less) stored as an array of 16-bit *code units* which correspond to UTF-16. There will be surrogate pairs in there if your string contains Unicode characters outside plane 0. You can get that representation easily: `var array1 = yourString.ToCharArray();` If for some reason you want the code units as `UInt16` values, do `var array2 = Array.ConvertAll<char, ushort>(array1, x => x);`. That is a `ushort[]` there.
Apr 28, 2017 at 13:59	comment	added	Kris Vandermotten		Are you assuming that `System.Text.Encoding.Unicode.GetBytes();` is doing some kind of expensive conversion that you want to avoid? If so, your assumption is wrong.
Apr 20, 2017 at 8:36	comment	added	Ark-kun		@AgnelKurian "He wants me to take care of writing and reading those numbers. I am not interpreting them." - If you weren't interpreting them, you'd have bytes and not "numbers". Then, your question disappears. If you have "numbers", that means you've already interpreted/decoded them and threw away the original byte data. And now you want to try and reconstruct the data (encode) which might not be even possible. What it the numbers were actually base-10 and by cramming them into base-2 floats, you've destroyed them forever? Don't want to encode? Don't decode then. Want bytes? Then use bytes.
Jan 9, 2017 at 1:15	history	edited	Peter Mortensen	CC BY-SA 3.0	Copy edited.
Aug 30, 2016 at 10:21	review	Suggested edits
Aug 30, 2016 at 11:34
Mar 5, 2016 at 15:00	history	edited	justhalf	CC BY-SA 3.0	Reword (with slight change in meaning) to make it more accurate in describing OP's use case, which is very specific (not string-to-byte conversion in general use case). Include comments from OP into the question to make the use case, which is very specific, clearer.
Feb 11, 2016 at 19:32	answer	added	Mojtaba Rezaeian		timeline score: 0
Jan 21, 2016 at 17:19	answer	added	IgnusFast		timeline score: -5
Aug 18, 2015 at 17:04	answer	added	Gerard ONeill		timeline score: 8
Jun 30, 2015 at 14:39	answer	added	alireza amini		timeline score: 1
Apr 24, 2015 at 9:47	history	edited	Peter Mortensen	CC BY-SA 3.0	Copy edited. Removed historical information (e.g. ref. <http://meta.stackexchange.com/a/230693> and <http://meta.stackoverflow.com/questions/266164>).
Jan 21, 2015 at 14:05	answer	added	Piero Alberto		timeline score: -1
Dec 17, 2014 at 21:23	comment	added	Greg D		@AgnelKurian: Are you trolling me? That question doesn't make sense. I could infer that you meant something like, "...store information about the encoding that was used 1000 times for 1000 different string." Nobody ever said anything about doing that, though, and it was explicitly denied earlier when I stated "The encoding of that string is an implicit part of the serialized contract..." so you couldn't have meant that.
Dec 17, 2014 at 2:42	comment	added	Agnel Kurian		@GregD so you want to store the same encoding 1000 times for 1000 different strings?
Dec 15, 2014 at 18:28	comment	added	Greg D		@Agnel Kurian: If you're writing arbitrary binary data, write binary data. That has nothing to do with the original question (which is fundamentally about serializing a string).
Dec 13, 2014 at 3:36	comment	added	Agnel Kurian		@Greg D, Let's say my client has some floating point numbers in some exotic format used to store astronomical distances. He uses just that one format. He wants me to take care of writing and reading those numbers. I am not interpreting them. My client interprets the numbers and all he needs to give me are the bytes I need to write. When reading, all he needs from me are the bytes I have written. Storing a format flag each time in addition to the bytes is a waste of space when he is using just one format for all numbers.
Dec 12, 2014 at 22:44	comment	added	Greg D		Four years later, I stand by my original comment on this question. It's fundamentally flawed because the fact that we're talking about a string implies interpretation. The encoding of that string is an implicit part of the serialized contract, otherwise it's just a bunch of meaningless bits. If you want meaningless bits, why generate them from a string at all? Just write a bunch of 0's and be done with it.
Nov 25, 2014 at 10:29	answer	added	Jodrell		timeline score: 4
Nov 3, 2014 at 21:50	comment	added	usr		@Mehrdad the existing answers were already invalid (not what was asked). Yours is pretty much the only answer that actually answers just what was asked. (I recommend, though, that you edit your answer to include a few warnings that this approach is really almost never the best one.)
Nov 3, 2014 at 21:37	comment	added	user541686		@usr: you just invalidated almost all the answers with your edit, and also made it harder for people to find this question with their natural search query (but you probably did that intentionally).
Nov 3, 2014 at 20:18	history	edited	usr	CC BY-SA 3.0	Edited the title to make it more obvious what approach is being asked here (the wrong one!)
Sep 9, 2014 at 11:30	answer	added	Jarvis Stark		timeline score: 17
Aug 28, 2014 at 16:14	answer	added	George		timeline score: 0
Aug 28, 2014 at 15:43	comment	added	George		A char is not a byte and a byte is not a char. A char is both a key into a font table and a lexical tradition. A string is a sequence of chars. (A words, paragraphs, sentences, and titles also have their own lexical traditions that justify their own type definitions -- but I digress). Like integers, floating point numbers, and everything else, chars are encoded into bytes. There was a time when the encoding was simple one to one: ASCII. However, to accommodate all of human symbology, the 256 permutations of a byte were insufficient and encodings were devised to selectively use more bytes.
Jun 11, 2014 at 11:29	answer	added	Vijay Singh Rana		timeline score: 2
Apr 9, 2014 at 12:39	answer	added	WonderWorker		timeline score: -1
S Mar 18, 2014 at 9:43	history	suggested	Newbee	CC BY-SA 3.0	removing tag from title
Mar 18, 2014 at 9:42	review	Suggested edits
S Mar 18, 2014 at 9:43
Dec 2, 2013 at 4:43	answer	added	Tom Blodget		timeline score: 105
Oct 22, 2013 at 12:55	answer	added	mashet		timeline score: 10
Sep 27, 2013 at 23:26	answer	added	Thomas Eding		timeline score: -12
Sep 2, 2013 at 11:21	answer	added	Shyam sundar shah		timeline score: 6
Aug 5, 2013 at 22:04	comment	added	Travis Watson		@AgnelKurian, A `char` is a `struct` that just happens to currently store values as a 16-bit number (UTF-16). What you're really asking (get the character bytes) isn't theoretically possible because it doesn't theoretically exist. A `char` or `string` has no Encoding by definition. What if the memory representation changed to UTF-32? Your "get the bytes, shove them back" would fail due to Encoding because you avoided Encoding. So "Why this dependency on encoding?!!!" Depend on Encoding so your code is dependable.
Jul 6, 2013 at 12:06	review	Close votes
Jul 6, 2013 at 17:14
Jul 6, 2013 at 11:47	comment	added	adamjcooper		possible duplicate of How do you convert a string to a byte array in .Net
Jun 27, 2013 at 19:25	history	protected	Paŭlo Ebermann
Jun 12, 2013 at 3:34	review	Suggested edits
Jun 12, 2013 at 3:37
Jun 5, 2013 at 10:52	answer	added	Shyam sundar shah		timeline score: 23
Jan 23, 2013 at 6:21	answer	added	sagardhavale		timeline score: -4
Jan 15, 2013 at 11:43	answer	added	Tommaso Belluzzo		timeline score: 3
Oct 12, 2012 at 6:43	history	rollback	Agnel Kurian		Rollback to Revision 4
Oct 11, 2012 at 17:47	history	edited	artbristol	CC BY-SA 3.0	Question is highly misleading in its current form. Added detail from OP's comments to clarify.
Oct 11, 2012 at 9:45	answer	added	Avlin		timeline score: 1
Apr 30, 2012 at 12:50	answer	added	Michael Buen		timeline score: 46
Apr 30, 2012 at 8:45	vote	accept	Agnel Kurian
Apr 30, 2012 at 7:44	answer	added	user541686		timeline score: 1948
Apr 30, 2012 at 7:26	answer	added	Erik A. Brandstadmoen		timeline score: 304
Jan 2, 2012 at 11:07	answer	added	user1120193		timeline score: 1
Jul 25, 2011 at 22:52	answer	added	Nathan		timeline score: 42
Mar 10, 2011 at 8:57	answer	added	Gman		timeline score: 26
Mar 22, 2010 at 8:40	answer	added	Alessandro Annini		timeline score: 9
Dec 1, 2009 at 19:47	comment	added	Greg		To play devil's advocate: If you wanted to get the bytes of an in-memory string (as .NET uses them) and manipulate them somehow (i.e. CRC32), and NEVER EVER wanted to decode it back into the original string...it isn't straight forward why you'd care about encodings or how you choose which one to use.
Jul 22, 2009 at 11:30	comment	added	Alexey Romanov		In case of .NET, the easy route is using UTF-16 on both sides, since that's what .NET uses internally.
Jul 16, 2009 at 11:45	answer	added	Konamiman		timeline score: 25
Apr 13, 2009 at 14:14	comment	added	Lucas Jones		You can take the easy route and just use UTF-8 on both sides.
Apr 13, 2009 at 14:13	comment	added	Lucas Jones		The encoding is what maps the characters to the bytes. For example, in ASCII, the letter 'A' maps to the number 65. In a different encoding, it might not be the same. The high-level approach to strings taken in the .NET framework makes this largely irrelevant, though (except in this case).
Mar 4, 2009 at 5:51	comment	added	Agnel Kurian		"A string is an array of chars, where a char is not a byte in the .Net world" Alright, but regardless of the encoding, each character maps to one or more bytes. Can I have those bytes please without having to specify an encoding?
Feb 19, 2009 at 21:03	answer	added	harmonik		timeline score: 1
Jan 30, 2009 at 11:02	vote	accept	Agnel Kurian
Apr 30, 2012 at 8:45
Jan 23, 2009 at 16:38	comment	added	Greg D		I think Anthony is trying to address the fundamental disconnect in <300 chars. You're assuming some consistent internal representation of a string, when in fact that representation could be anything. To create, and eventually decode, the bytestream, you must choose an encoding to use.
Jan 23, 2009 at 16:36	answer	added	Michael Buen		timeline score: 120
Jan 23, 2009 at 15:54	answer	added	Joel Coehoorn		timeline score: 53
Jan 23, 2009 at 14:34	answer	added	Ed Marty		timeline score: 14
Jan 23, 2009 at 14:19	history	edited	Dale Ragan		Added c# tag.
Jan 23, 2009 at 14:15	answer	added	Hans Passant		timeline score: 11
Jan 23, 2009 at 14:15	comment	added	Igal Tabachnik		Have a look at Jon Skeet's answer in a post with the exact question. It will explain why you depend on encoding.
Jan 23, 2009 at 14:05	comment	added	Agnel Kurian		Every string is stored as an array of bytes right? Why can't I simply have those bytes?
Jan 23, 2009 at 14:03	answer	added	Zhaph - Ben Duguid		timeline score: 100
Jan 23, 2009 at 14:00	comment	added	Greg D		If you're encrypting it, then you'll still have to know what the encoding is after you decrypt it so that you know how to reinterpret those bytes back into a string.
Jan 23, 2009 at 13:57	comment	added	Agnel Kurian		I'm going to encrypt it. I can encrypt it without converting but I'd still like to know why encoding comes to play here. Just give me the bytes is what I say.
Jan 23, 2009 at 13:56	comment	added	Greg D		Your confusion over the role of encoding makes me wonder if this is the right question. Why are you trying to convert a string to a byte array? What are you going to do with the byte array?
Jan 23, 2009 at 13:51	history	edited	kemiller2002	CC BY-SA 2.5	edited title
Jan 23, 2009 at 13:49	history	edited	Agnel Kurian	CC BY-SA 2.5	why encoding
Jan 23, 2009 at 13:43	answer	added	cyberbobcat		timeline score: -3
Jan 23, 2009 at 13:43	answer	added	bmotmans		timeline score: 1143
Jan 23, 2009 at 13:43	answer	added	gkrogers		timeline score: 20
Jan 23, 2009 at 13:39	history	asked	Agnel Kurian	CC BY-SA 2.5

toggle format

Collectives™ on Stack Overflow

Timeline for How do I get a consistent byte representation of strings in C# without manually specifying an encoding?

Current License: CC BY-SA 4.0