Standard library class doesn't serve all needs

Choosing C++ Character String Classes
Third of a three-part discussion

Conrad Weisert, November 1, 2003
©2003 Information Disciplines, Inc.

This article may be freely circulated, as long as the copyright credit is included.


Background and recent interest

Andrew Koenig & Barbara Moo offered helpful advice about character-string manipulation in the August C/C++ Users Journal. They were aiming especially at experienced programmers who had gotten into the habit of using C's crude array of char rather than more modern techniques.

In two earlier articles, I offered some comments and clarifications:

  1. Coping with C's Strings -- a disciplined approach to the use of C's crude array of char way of representing character strings. (August Issue of the Month)
  2. Objects before Strings -- a plea for exploiting object-oriented design instead of unstructured character string data. (September Issue of the Month)

We come now to confronting the string class itself. Everyone surely agrees that applications programming, both business and scientific, demands the capabilities provided by one or more character-string classes. We also know that the C++ standard library now contains a string class, std::string. Does that library class take care of all the needs of new and existing programs?

Compatibility issues

Unfortunately, recognition of the need and availability of std::string didn't occur at the same time. From the first day C++ was unveiled, programmers recognized that the language's class definition capability held the key to solving the long-standing C character-string problem.

Many of those programmers went to work desigining and implementing character string classes. Those classes spanned a huge range in quality and usability. Some of them were distributed by vendors of compilers or class libraries. Others were established as standard within developer organizations. By the mid 1990s the best of them supported character-string handling comparable to that of PL/I or extended BASIC.

Of course we all agree with Koenig & Moo that robust, maintainable programs must avoid C's array of char string representations. But we may or may not be able to embrace std::string easily, if we already have an investment in other character string classes. For some, that's just an irritating and possibly costly conversion issue. Others, however, are finding that std::string doesn't support everything they need.

Limitations of the Standard Library String Class

The std::string class supports varying length strings with no length limit. (There is presumably some implementation-defined maximum size, but given today's huge memory sizes it's likely to exceed reasonable applications' requirements.) You declare a string and then assign data to it ranging from the null (0 length) string to an entire book.

That's equivalent to what extended BASIC supports.

Furthermore, the internal representation is not contiguous with the object. The string object contains only a pointer, which may point either to the actual character data or, more likely, to a second pointer (the so-called "reference counting" technique). That's not an implementation choice, but is dictated by limitations of the underlying C language.

But applications, especially business applicatons, also need:

Note that many older programming languages, including PL/I and COBOL, supported just those capabilities. When I teach C++ or Java to an audience of former COBOL programmers, they're appalled, by the trouble you have to go to in order to handle what they consider the simplest and most straightforward kind of everyday data manipulation.

One explanation is that many applications view character strings as elementary data fields, while C/C++/Java programmers have come to view character strings as containers.

Now, if you don't need those capabilities and you determine that std::string meets all your needs, then that's the only string class you should use, and you can stop reading here.

IDI's Library String Classes

I'm going to describe (but not recommend for you) the character-string capabilities we've been using for internal and client applications since the early 1990s. We've gotten used to them over more than a decade, we like them, and we continue to use them, even in the face of std::string.

Four string classes

Dstring. Fstring. Vstring. Cstring.
Dynamic (like std::string): Assignment that changes the size will cause memory reallocation. Fixed-length: Once a string is constructed it stays the same size. Assignment can truncate or pad with blanks. Varying string: Like Dstring except that the maximum size is specified (and allocated) upon construction, like a PL/I varying string. Constant-length: Data embedded within the object. Size must be known at compile time. (See separate article for more information.)

Objects of those classes interact with each other in the expected ways. Mixed expressions may cause implicit conversions to the most general class, Dstring, and may slow performance.

Vstring is provided for efficiency in situations where a program is building up a long string by successive concatenations.

Cstring uses class templates, e.g.

      Cstring<18>  cityName;
We advise users to keep the number of such classes reasonable, and to limit Cstring data to fields within records and to internal tables.

Behavior

We were guided by long years of experience with string handling in PL/I and later in extended BASIC. The result was somewhat simpler than std::string, which offers too many functions with overlapping functionality.

Customization options

We made heavy use of the macro preprocessor to allow an organization or a project to choose between alternative standards. For example:

Internal representation

As an expedient we began Dstring with the traditional C null-terminated array. That allowed us to use the C library routines internally. For Cstring, however, we omit the null terminator, which would be an unwelcome intrusion in an embedded data field.

From time to time, we consider switching to a reference counting implementation in order to gain some efficiency. We put it off, however, until such time as we encounter serious performance degradation caused by the copy constructors.

Availability

The above is not intended as a sales pitch for IDI's library string-classes, but just to show some of the issues in string class design and usage. If you already have one or more string classes that you like, you should continue to use them.

Because of the complexity of the customization options, we're not posting these classes as freeware on this web site. If you think you want them, let's discuss your needs.


Return to Technical articles
Return to C++ topics
Return to IDI home page.

Last modified February 10, 2010