Given that strings are immutable in .NET, I'm wondering why they have been designed such that string.Substring()
takes O( substring.Length
) time, instead of O(1)?
i.e. what were the tradeoffs, if any?
Source: Tips4all, CCNA FINAL EXAM
Cisco Certified Network Associate Exam,640-802 CCNA All Answers ~100/100. Daily update
Given that strings are immutable in .NET, I'm wondering why they have been designed such that string.Substring()
takes O( substring.Length
) time, instead of O(1)?
i.e. what were the tradeoffs, if any?
Precisely because Strings are immutable, .Substring must make a copy of at least a portion of the original string. Making a copy of n bytes should take O(n) time.
ReplyDeleteHow do you think you would copy a bunch of bytes in constant time?
EDIT: Mehrdad suggests not copying the string at all, but keeping a reference to a piece of it.
Consider in .Net, a multi-megabyte string, on which someone calls .SubString(n, n+3) (for any n in the middle of the string).
Now, the ENTIRE string cannot be Garbage Collected just because one reference is holding on to 4 characters?
That seems like a ridiculous waste of space.
Further, tracking references to substrings (which may even be inside substrings), and trying to copy at optimal times to avoid defeating the GC (as described above), makes the concept a nightmare. It is far simpler, and more reliable, to copy on .SubString, and maintain the straightforward immutable model.
Java (as opposed to .NET) provides two ways of doing Substring(), you can consider whether you want to keep just a reference or copy a whole substring to a new memory location.
ReplyDeleteThe simple .substring(...) shares the internally used char array with the original String object, which you then with new String(...) can copy to a new array, if needed (to avoid hindering garbage collection of the original one).
I think this kind of flexibility is a best option for a developer.
Keeping sub-strings as references into the original would have some obvious benefit, but you asked for the drawbacks:
ReplyDeleteA reference to a (small) substring would block the entire string for GC
The GC would have to be able to find the mother-string from the substring, potentially expensive
The start of the reference/pointer to the chars would not always be word-aligned
The current 'hack' of strings having a Length and a '\0' would have to be abandoned
The savings would be rare.
Despite what Eric may claim O(n) is always O(n). Big O is a statement about asymptotic complexity, that is the relative growth of two functions. The very definition of O is that:
ReplyDeletef(x) = O(g(x)) iff there exists a values of k and c such that f(x) <
c|g(x)| for all values of x greater than k.
Many programmers conflate big O with algorithm performance, but they are very different things.
The most obvious solution to creating an O(1) algorithm for substring is to have a separate concrete class for the type returned by substring, and to use a string interface (say IString) instead of concrete types. This is a lot of extra complexity for minimal trade-off. See Eric's answer. People generally expect strings to be simple concrete types.