Skip to main content

.NET String to byte Array C#


How do I convert a string to a byte array in .NET (C#)?



Update: Also please explain why encoding should be taken into consideration. Can't I simply get what bytes the string has been stored in? Why this dependency on encoding?!!!


Source: Tips4allCCNA FINAL EXAM

Comments

  1. Contrary to the answers here, you DON'T need to worry about encoding!

    Like you mentioned, your goal is, simply, to "get what bytes the string has been stored in".
    (And, of course, to be able to re-construct the string from the bytes.)

    For those goals, I honestly do not understand why people keep telling you that you need the encodings. You certainly do NOT need to worry about encodings for this.

    Just do this instead:

    static byte[] GetBytes(string str)
    {
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
    }

    static string GetString(byte[] bytes)
    {
    char[] chars = new char[bytes.Length / sizeof(char)];
    System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
    return new string(chars);
    }


    As long as your program (or other programs) don't try to interpret the bytes somehow, which you obviously didn't mention you intend to do, then there is nothing wrong with this approach! Worrying about encodings just makes your life more complicated for no real reason.

    Additional benefit to this approach:

    It doesn't matter if the string contains invalid characters, because you can still get the data and reconstruct the original string anyway!

    It will be encoded and decoded just the same, because you are just looking at the bytes.

    If you used a specific encoding, though, it would've given you trouble with encoding/decoding invalid characters.

    ReplyDelete
  2. It depends on the encoding of your string (ASCII, UTF8, ...).

    e.g.:

    byte[] b1 = System.Text.Encoding.UTF8.GetBytes (myString);
    byte[] b2 = System.Text.Encoding.ASCII.GetBytes (myString);


    Update:
    A small sample why encoding matters:

    string pi = "\u03a0";
    byte[] ascii = System.Text.Encoding.ASCII.GetBytes (pi);
    byte[] utf8 = System.Text.Encoding.UTF8.GetBytes (pi);

    Console.WriteLine (ascii.Length); //will print 1
    Console.WriteLine (utf8.Length); //will print 2
    Console.WriteLine (System.Text.Encoding.ASCII.GetString (ascii)); //will print '?'


    ASCII simply isn't equipped to deal with special characters.

    Internally, the .NET framework uses UTF16 to represent strings, so if you simply want to get the exact bytes that .NET uses, use System.Text.Encoding.Unicode.GetBytes (...).

    See msdn for more information.

    ReplyDelete
  3. BinaryFormatter bf = new BinaryFormatter();
    byte[] bytes;
    MemoryStream ms = new MemoryStream();

    string orig = "喂 Hello 谢谢 Thank You";
    bf.Serialize(ms, orig);
    ms.Seek(0, 0);
    bytes = ms.ToArray();

    MessageBox.Show("Original bytes Length: " + bytes.Length.ToString());

    MessageBox.Show("Original string Length: " + orig.Length.ToString());

    for (int i = 0; i < bytes.Length; ++i) bytes[i] ^= 168; // pseudo encrypt
    for (int i = 0; i < bytes.Length; ++i) bytes[i] ^= 168; // pseudo decrypt

    BinaryFormatter bfx = new BinaryFormatter();
    MemoryStream msx = new MemoryStream();
    msx.Write(bytes, 0, bytes.Length);
    msx.Seek(0, 0);
    string sx = (string)bfx.Deserialize(msx);

    MessageBox.Show("Still intact :" + sx);

    MessageBox.Show("Deserialize string Length(still intact): "
    + sx.Length.ToString());

    BinaryFormatter bfy = new BinaryFormatter();
    MemoryStream msy = new MemoryStream();
    bfy.Serialize(msy, sx);
    msy.Seek(0, 0);
    byte[] bytesy = msy.ToArray();

    MessageBox.Show("Deserialize bytes Length(still intact): "
    + bytesy.Length.ToString());

    ReplyDelete
  4. You need to take the encoding into account, because 1 character could be represented by 1 or more bytes (up to about 6), and different encodings will treat these bytes differently.

    Joel has a posting on this:


    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

    ReplyDelete
  5. Have a look at Jon Skeet's answer in a post with the exact question. It will explain why you depend on encoding.

    ReplyDelete
  6. The first part of your question (how to get the bytes) was already answered by others: look in the System.Text.Encoding namespace.

    I will address your follow-up question: why do you need to pick an encoding? Why can't you get that from the string class itself?

    The answer is that the bytes used internally by the string class don't matter.

    If your program is entirely within the .Net world then you don't need to worry about getting byte arrays for strings at all, even if you're sending data across a network. Instead, use .Net Serialization to worry about transmitting the data. You don't worry about the actual bytes any more: the Serialization formatter does it for you.

    On the other hand, what if you are sending these bytes somewhere that you can't guarantee will pull in data from a .Net serialized stream? In this case you definitely do need to worry about encoding, because obviously this external system cares. So again, the internal bytes used by the string don't matter: you need to pick an encoding so you can be explicit about this encoding on the receiving end.

    I understand that in this case you might prefer to use the actual bytes stored by the string variable in memory where possible, with the idea that it might save some work creating your byte stream. But that's just not important compared to making sure that your output is understood at the other end, and to guarantee that you must be explicit with your encoding. If you really want to match your internal bytes, just use the Unicode encoding.

    ReplyDelete
  7. byte[] strToByteArray(string str)
    {
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    return enc.GetBytes(str);
    }

    ReplyDelete
  8. Try this, a lot less code.
    Encoding.UTF8.GetBytes("TEST String");

    ReplyDelete
  9. Fastest way

    public static byte[] GetBytes(string text)
    {
    return ASCIIEncoding.UTF8.GetBytes(text);
    }

    ReplyDelete
  10. The key issue is that a glyph in a string takes 32 bits (16 bits for a character code) but a byte only has 8 bits to spare. A one-to-one mapping doesn't exist unless you restrict yourself to strings that only contain ASCII characters. System.Text.Encoding has lots of ways to map a string to byte[], you need to pick one that avoids loss of information and that is easy to use by your client when she needs to map the byte[] back to a string.

    Utf8 is a popular encoding, it is compact and not lossy.

    ReplyDelete
  11. I'm not sure, but I think the string stores its info as an array of Chars, which is inefficient with bytes. Specifically, the definition of a Char is "Represents a Unicode character".

    take this example sample:

    String str = "asdf éß";
    String str2 = "asdf gh";
    EncodingInfo[] info = Encoding.GetEncodings();
    foreach (EncodingInfo enc in info)
    {
    System.Console.WriteLine(enc.Name + " - "
    + enc.GetEncoding().GetByteCount(str)
    + enc.GetEncoding().GetByteCount(str2));
    }


    Take note that the Unicode answer is 14 bytes in both instances, whereas the UTF-8 answer is only 9 bytes for the first, and only 7 for the second.

    So if you just want the bytes used by the string, simply use Encoding.Unicode, but it will be inefficient with storage space.

    ReplyDelete
  12. Well, I've read all answers and they were about using encoding or one about serialization that drops unpaired surrogates.

    It's bad when the string, for example, comes from SQL Server where it was built from byte array storing, for example password hash. If we drop anything from it, it'll store invalid hash, and if we want to store it in XML, we want to leave it intact (because XML writer drops exception on any unpaired surrogate it finds).

    So I myself use base64 encoding of byte arrays in such cases, but hey, on the internet is only one solution to this in c# and it has bug in it and is only one way, so i've fixed the bug and written back procedure, here you are, future googlers:

    public static byte[] StringToBytes(string str)
    {
    byte[] data = new byte[str.Length * 2];
    for (int i = 0; i < str.Length; ++i)
    {
    char ch = str[i];
    data[i * 2] = (byte)(ch & 0xFF);
    data[i * 2 + 1] = (byte)((ch & 0xFF00) >> 8);
    }

    return data;
    }

    public static string StringFromBytes(byte[] arr)
    {
    char[] ch = new char[arr.Length / 2];
    for (int i = 0; i < ch.Length; ++i)
    {
    ch[i] = (char)((int)arr[i * 2] + (((int)arr[i * 2 + 1]) << 8));
    }
    return new String(ch);
    }

    ReplyDelete
  13. // C# to convert a string to a byte array.
    public static byte[] StrToByteArray(string str)
    {
    System.Text.ASCIIEncoding encoding=new System.Text.ASCIIEncoding();
    return encoding.GetBytes(str);
    }


    // C# to convert a byte array to a string.
    byte [] dBytes = ...
    string str;
    System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
    str = enc.GetString(dBytes);

    ReplyDelete
  14. Two ways:

    public static byte[] StrToByteArray(this string s)
    {
    List<byte> value = new List<byte>();
    foreach (char c in s.ToCharArray())
    value.Add(c.ToByte());
    return value.ToArray();
    }


    And,

    public static byte[] StrToByteArray(this string s)
    {
    s = s.Replace(" ", string.Empty);
    byte[] buffer = new byte[s.Length / 2];
    for (int i = 0; i < s.Length; i += 2)
    buffer[i / 2] = (byte)Convert.ToByte(s.Substring(i, 2), 16);
    return buffer;
    }


    I tend to use the bottom one more often than the top, haven't benchmarked them for speed.

    ReplyDelete
  15. Also please explain why encoding should be taken into consideration.
    Can't I simply get what bytes the string has been stored in?
    Why this dependency on encoding?!!!


    Because there is no such thing as "the bytes of the string".

    A string (or more generically, a text) is composed of characters: letters, digits, and other symbols. That's all. Computers, however, do not know anything about characters; they can only handle bytes. Therefore, if you want to store or transmit text by using a computer, you need to transform the characters to bytes. How do you do that? Here's where encodings come to the scene.

    An encoding is nothing but a convention to translate logical characters to phyisical bytes. The simplest and best known encoding is ASCII, and it is all you need if you write in english. For other languages you will need more complete encodings, being any of the Unicode flavours the safest choice nowadays.

    So, in short, trying to "get the bytes of a string without using encodings" is as impossible as "writing a text without using any language".

    ReplyDelete
  16. bytes[] buffer = UnicodeEncoding.UTF8.GetBytes(string something); //for converting to UTF then get its bytes

    bytes[] buffer = ASCIIEncoding.ASCII.GetBytes(string something); //for converting to ascii then get its bytes

    ReplyDelete
  17. The accepted answer is very, very complicated. Use the included .NET classes for this:

    const string data = "A string with international characters: Norwegian: ÆØÅæøå, Chinese: 喂 谢谢";
    var bytes = System.Text.Encoding.UTF8.GetBytes(data);
    var decoded = System.Text.Encoding.UTF8.GetString(bytes);


    Don't reinvent the wheel if you don't have too...

    ReplyDelete
  18. Just to demonstrate that Mehrdrad's sound answer works, his approach can even persist the unpaired surrogate characters(of which many had leveled against my answer, but of which everyone are equally guilty of, e.g. System.Text.Encoding.UTF8.GetBytes, System.Text.Encoding.Unicode.GetBytes; those encoding methods can't persist the high surrogate characters d800 for example, and those just merely replace high surrogate characters with value fffd ) :

    using System;

    class Program
    {
    static void Main(string[] args)
    {
    string t = "爱虫";
    string s = "Test\ud800Test";

    byte[] dumpToBytes = GetBytes(s);
    string getItBack = GetString(dumpToBytes);

    foreach (char item in getItBack)
    {
    Console.WriteLine("{0} {1}", item, ((ushort)item).ToString("x"));
    }
    }

    static byte[] GetBytes(string str)
    {
    byte[] bytes = new byte[str.Length * sizeof(char)];
    System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
    return bytes;
    }

    static string GetString(byte[] bytes)
    {
    char[] chars = new char[bytes.Length / sizeof(char)];
    System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
    return new string(chars);
    }
    }


    Output:

    T 54
    e 65
    s 73
    t 74
    ? d800
    T 54
    e 65
    s 73
    t 74


    Try that with System.Text.Encoding.UTF8.GetBytes or System.Text.Encoding.Unicode.GetBytes, they will merely replace high surrogate characters with value fffd

    Every time there's a movement in this question, I'm still thinking of a serializer(be it from Microsoft or from 3rd party component) that can persist strings even it contains unpaired surrogate characters; I google this every now and then: serialization unpaired surrogate character .NET. This doesn't make me lose any sleep, but it's kind of annoying when every now and then there's somebody commenting on my answer that it's flawed, yet their answers are equally flawed when it comes to unpaired surrogate characters.

    Darn, Microsoft should have just used System.Buffer.BlockCopy in its BinaryFormatter ツ

    谢谢!

    ReplyDelete

Post a Comment

Popular posts from this blog

[韓日関係] 首相含む大幅な内閣改造の可能性…早ければ来月10日ごろ=韓国

div not scrolling properly with slimScroll plugin

I am using the slimScroll plugin for jQuery by Piotr Rochala Which is a great plugin for nice scrollbars on most browsers but I am stuck because I am using it for a chat box and whenever the user appends new text to the boxit does scroll using the .scrollTop() method however the plugin's scrollbar doesnt scroll with it and when the user wants to look though the chat history it will start scrolling from near the top. I have made a quick demo of my situation http://jsfiddle.net/DY9CT/2/ Does anyone know how to solve this problem?

Why does this javascript based printing cause Safari to refresh the page?

The page I am working on has a javascript function executed to print parts of the page. For some reason, printing in Safari, causes the window to somehow update. I say somehow, because it does not really refresh as in reload the page, but rather it starts the "rendering" of the page from start, i.e. scroll to top, flash animations start from 0, and so forth. The effect is reproduced by this fiddle: http://jsfiddle.net/fYmnB/ Clicking the print button and finishing or cancelling a print in Safari causes the screen to "go white" for a sec, which in my real website manifests itself as something "like" a reload. While running print button with, let's say, Firefox, just opens and closes the print dialogue without affecting the fiddle page in any way. Is there something with my way of calling the browsers print method that causes this, or how can it be explained - and preferably, avoided? P.S.: On my real site the same occurs with Chrome. In the ex