If strings are immutable in .NET, then why does Substring take O(n) time?

Given that strings are immutable in .NET, I'm wondering why they have been designed such that string.Substring() takes O( substring.Length ) time, instead of O(1)?

i.e. what were the tradeoffs, if any?

Source: Tips4all, CCNA FINAL EXAM

Comments

UserMay 19, 2012 at 2:56 AM
Precisely because Strings are immutable, .Substring must make a copy of at least a portion of the original string. Making a copy of n bytes should take O(n) time.

How do you think you would copy a bunch of bytes in constant time?

EDIT: Mehrdad suggests not copying the string at all, but keeping a reference to a piece of it.

Consider in .Net, a multi-megabyte string, on which someone calls .SubString(n, n+3) (for any n in the middle of the string).

Now, the ENTIRE string cannot be Garbage Collected just because one reference is holding on to 4 characters?
That seems like a ridiculous waste of space.

Further, tracking references to substrings (which may even be inside substrings), and trying to copy at optimal times to avoid defeating the GC (as described above), makes the concept a nightmare. It is far simpler, and more reliable, to copy on .SubString, and maintain the straightforward immutable model.
ReplyDelete
Replies
UserMay 19, 2012 at 2:56 AM
Java (as opposed to .NET) provides two ways of doing Substring(), you can consider whether you want to keep just a reference or copy a whole substring to a new memory location.

The simple .substring(...) shares the internally used char array with the original String object, which you then with new String(...) can copy to a new array, if needed (to avoid hindering garbage collection of the original one).

I think this kind of flexibility is a best option for a developer.
ReplyDelete
Replies
UserMay 19, 2012 at 2:56 AM
Keeping sub-strings as references into the original would have some obvious benefit, but you asked for the drawbacks:

A reference to a (small) substring would block the entire string for GC
The GC would have to be able to find the mother-string from the substring, potentially expensive
The start of the reference/pointer to the chars would not always be word-aligned
The current 'hack' of strings having a Length and a '\0' would have to be abandoned
The savings would be rare.
ReplyDelete
Replies
UserMay 19, 2012 at 2:56 AM
Despite what Eric may claim O(n) is always O(n). Big O is a statement about asymptotic complexity, that is the relative growth of two functions. The very definition of O is that:

f(x) = O(g(x)) iff there exists a values of k and c such that f(x) <
c|g(x)| for all values of x greater than k.

Many programmers conflate big O with algorithm performance, but they are very different things.

The most obvious solution to creating an O(1) algorithm for substring is to have a separate concrete class for the type returned by substring, and to use a string interface (say IString) instead of concrete types. This is a lot of extra complexity for minimal trade-off. See Eric's answer. People generally expect strings to be simple concrete types.
ReplyDelete
Replies

Add comment

CCNA, CCNP, MCSA, CCNA Final Exam, All Answer Test Module With 100/100

Search This Blog

If strings are immutable in .NET, then why does Substring take O(n) time?

Labels

Comments

Post a Comment

Popular posts from this blog

CCNA 1 Final Exam 2011 latest (hot hot hot)

Slow Android emulator

What is the worst gotcha in C# or .NET?