Data Item Taxonomy and Object-Oriented Class Definition

Conrad Weisert, Information Disciplines, Inc., 29 March 1996
(This article originally appeared in the Chicago SIGOOT Newsletter,
revised January 28, 1999)

Discussions about patterns, templates, C++, Java, and other OOT topics often lead to arguments over what a class always (or never) should do. Those arguments often stem from confusion over basic differences among three very different categories1 of data item:

  1. elementary data items that can't (meaningfully) be decomposed into other component data items. Elementary items are further divided into:
  2. composite data items made up of a fixed sequence of subordinate (elementary or composite) data items. Most application-domain entities and records fall in this category.
  3. container data items used to store, often temporarily, other data items. The structures of the Standard Template Library (STL) fall in this category, as do files.

Older textbooks sometimes lumped composite and container data together as data "aggregates". Today object-oriented design methodology is helping to clarify the essential distinction, by:

In object-oriented design we routinely need to define classes in all of the above categories. Although each category calls for an entirely different approach from the others, none of the mainstream object-oriented languages (C++, Java, and Smalltalk) supports such a distinction. Each language provides just one set of facilities for all class definitions. It's up to us to apply those facilities appropriately.

Based on experience in several projects, the following table shows some typical differences:

If your experience indicates another distribution, that's all right. The point is that the three categories are very different from one another. Techniques and patterns we take for granted for a class in one category are irrelevant and inappropriate for a class in another category. Courses4 and textbooks ought to draw those distinctions more clearly. We must do so whenever we design and develop a new type or class.

1 This taxonomy doesn't map exactly into any programming language, but it holds up well in modeling reality.

2 Treating character strings as containers is a peculiarity of C and C-like languages. In the majority of uses, it's much more natural to view them as elementary items.

3 We include both primitive (C) pointers, and pointer-like objects, such as so-called "iterators".

4 For examples of specialized advanced C++ courses that focus on each category, see IDI's Advanced single-session seminars for experienced C++ programmers.