I was just thinking about the implementation of std::string::substr
. It returns a new std::string
object, which seems a bit wasteful to me. Why not return an object that refers to the contents of the original string and can be implicitly assigned to a std::string
? A kind of lazy evaluation of the actual copying. Such a class could look something like this:
template <class Ch, class Tr, class A>
class string_ref {
public:
// not important yet, but *looks* like basic_string's for the most part
private:
const basic_string<Ch, Tr, A> &s_;
const size_type pos_;
const size_type len_;
};
The public interface of this class would mimic all of the read-only operations of a real std::string
, so the usage would be seamless. std::string
could then have a new constructor which takes a string_ref
so the user would never be the wiser. The moment you try to "store" the result, you end up creating a copy, so no real issues with the reference pointing to data and then having it modified behind its back.
The idea being that code like this:
std::string s1 = "hello world";
std::string s2 = "world";
if(s1.substr(6) == s2) {
std::cout << "match!" << std::endl;
}
would have no more than 2 std::string
objects constructed in total. This seems like a useful optimization for code which that performs a lot of string manipulations. Of course, this doesn't just apply to std::string
, but to any type which can return a subset of its contents.
As far as I know, no implementations do this.
I suppose the core of the question is:
Given a class that can be implicitly converted to a std::string
as needed, would it be conforming to the standard for a library writer to change the prototype of a member's to return type? Or more generally, do the library writers have the leeway to return "proxy objects" instead of regular objects in these types of cases as an optimization?
My gut is that this is not allowed and that the prototypes must match exactly. Given that you cannot overload on return type alone, that would leave no room for library writers to take advantage of these types of situations. Like I said, I think the answer is no, but I figured I'd ask :-).
This idea is copy-on-write, but instead of COW'ing the entire buffer, you keep track of which subset of the buffer is the "real" string. (COW, in its normal form, was (is?) used in some library implementations.)
So you don't need a proxy object or change of interface at all because these details can be made completely internal. Conceptually, you need to keep track of four things: a source buffer, a reference count for the buffer, and the start and end of the string within this buffer.
Anytime an operation modifies the buffer at all, it creates its own copy (from the start and end delimiters), decreases the old buffer's reference count by one, and sets the new buffer's reference count to one. The rest of the reference counting rules are the same: copy and increase count by one, destruct a string and decrease count by one, reach zero and delete, etc.
substr
just makes a new string instance, except with the start and end delimiters explicitly specified.