Skip to main content

If strings are immutable in .NET, then why does Substring take O(n) time?


Given that strings are immutable in .NET, I'm wondering why they have been designed such that string.Substring() takes O( substring.Length ) time, instead of O(1)?



i.e. what were the tradeoffs, if any?


Source: Tips4allCCNA FINAL EXAM

Comments

  1. Precisely because Strings are immutable, .Substring must make a copy of at least a portion of the original string. Making a copy of n bytes should take O(n) time.

    How do you think you would copy a bunch of bytes in constant time?



    EDIT: Mehrdad suggests not copying the string at all, but keeping a reference to a piece of it.

    Consider in .Net, a multi-megabyte string, on which someone calls .SubString(n, n+3) (for any n in the middle of the string).

    Now, the ENTIRE string cannot be Garbage Collected just because one reference is holding on to 4 characters?
    That seems like a ridiculous waste of space.

    Further, tracking references to substrings (which may even be inside substrings), and trying to copy at optimal times to avoid defeating the GC (as described above), makes the concept a nightmare. It is far simpler, and more reliable, to copy on .SubString, and maintain the straightforward immutable model.

    ReplyDelete
  2. Java (as opposed to .NET) provides two ways of doing Substring(), you can consider whether you want to keep just a reference or copy a whole substring to a new memory location.

    The simple .substring(...) shares the internally used char array with the original String object, which you then with new String(...) can copy to a new array, if needed (to avoid hindering garbage collection of the original one).

    I think this kind of flexibility is a best option for a developer.

    ReplyDelete
  3. Keeping sub-strings as references into the original would have some obvious benefit, but you asked for the drawbacks:


    A reference to a (small) substring would block the entire string for GC
    The GC would have to be able to find the mother-string from the substring, potentially expensive
    The start of the reference/pointer to the chars would not always be word-aligned
    The current 'hack' of strings having a Length and a '\0' would have to be abandoned
    The savings would be rare.

    ReplyDelete
  4. Despite what Eric may claim O(n) is always O(n). Big O is a statement about asymptotic complexity, that is the relative growth of two functions. The very definition of O is that:


    f(x) = O(g(x)) iff there exists a values of k and c such that f(x) <
    c|g(x)| for all values of x greater than k.


    Many programmers conflate big O with algorithm performance, but they are very different things.

    The most obvious solution to creating an O(1) algorithm for substring is to have a separate concrete class for the type returned by substring, and to use a string interface (say IString) instead of concrete types. This is a lot of extra complexity for minimal trade-off. See Eric's answer. People generally expect strings to be simple concrete types.

    ReplyDelete

Post a Comment

Popular posts from this blog

[韓日関係] 首相含む大幅な内閣改造の可能性…早ければ来月10日ごろ=韓国

div not scrolling properly with slimScroll plugin

I am using the slimScroll plugin for jQuery by Piotr Rochala Which is a great plugin for nice scrollbars on most browsers but I am stuck because I am using it for a chat box and whenever the user appends new text to the boxit does scroll using the .scrollTop() method however the plugin's scrollbar doesnt scroll with it and when the user wants to look though the chat history it will start scrolling from near the top. I have made a quick demo of my situation http://jsfiddle.net/DY9CT/2/ Does anyone know how to solve this problem?

Why does this javascript based printing cause Safari to refresh the page?

The page I am working on has a javascript function executed to print parts of the page. For some reason, printing in Safari, causes the window to somehow update. I say somehow, because it does not really refresh as in reload the page, but rather it starts the "rendering" of the page from start, i.e. scroll to top, flash animations start from 0, and so forth. The effect is reproduced by this fiddle: http://jsfiddle.net/fYmnB/ Clicking the print button and finishing or cancelling a print in Safari causes the screen to "go white" for a sec, which in my real website manifests itself as something "like" a reload. While running print button with, let's say, Firefox, just opens and closes the print dialogue without affecting the fiddle page in any way. Is there something with my way of calling the browsers print method that causes this, or how can it be explained - and preferably, avoided? P.S.: On my real site the same occurs with Chrome. In the ex