Advanced Run Time Type Identification in C++

Part I

Requirements

Peter Barczikay (bpeter@rcs.hu)

Andras Tantos (tantos@rcs.hu)

Copyright 2003 by Robot Control Software Ltd. (http://www.rcs.hu). All rights reserved.

May 3., 2003.

Abstract

Run Time Type Identification (RTTI) provides some information about objects at run time such as the name of its type. The C++ language has RTTI support, which fulfills the minimal requirements, but it is not enough for many applications of RTTI, such as object persistency. Other languages (Java and C#) have better RTTI system making possible to declare properties for accessing the objects and the language implements persistency, but these languages have other disadvantages. C++ programs may also need persistency and the advanced features of RTTI systems. The C++ language is so powerful that it is possible to implement properties and an advanced RTTI system for persistency. The 1st part of this article summarizes the application and requirements of an advanced RTTI system, the 2nd part will show how to implement it, and the 3rd part will describe how to use such an RTTI system for persistency.

RTTI Supported by C++

The standard C++ provides the typeid() operator for getting type information. Its argument is an expression (a reference or a pointer of an object) or a type name. It returns a constant reference to a type_info object containing some information of the objectís type.

The type_info class has only a few member functions:

This information does not help to find the relations of objects and does not solve most of the problems arise in applications. It is not designed for that. All applications have different requirements and the same standard type description cannot fulfill all the requirements.

However the type_info class can be used as a key in a map storing more detailed type information [Strousstrup 15.4.4]. This way every application can define and use its own RTTI system, which is a very flexible solution. The problem is, that someone has to define the structure of the RTTI record and fill the map with records for every type. This job is not trivial and some code has to be written for every application for doing that.

Are the applications of RTTI are so different? What are the requirements and what are common problems in most applications? How can we implement a useful RTTI system? How and for what can we use it? These questions will be discussed in three parts.

The first part of the article discusses two typical applications of RTTI. It collects and classifies the requirements of a general purpose RTTI system.

The second part describes how to implement an RTTI system that fulfills these requirements. The C++ language is probably the most powerful programming language. The implementation of such an RTTI system is not possible in most other programming language. This fact demonstrates very well the power of C++. The implementation uses a lot of advanced programming tricks and design patterns therefore it might be interesting even if you are not very much interested in RTTI systems.

The third part of the article adds some further idea to persistency. It describes a Stream Library using the presented RTTI system. The modularity and flexibility of this solution leads to many advantages discussed at the end of the article.

Applications of RTTI

There are many possible applications of RTTI, but two of them, persistency and application generators, are widely known. They are probably the most difficult applications and hopefully they need the most services of the RTTI system.

Persistency

The basic task of persistency is simple. The application or some important data of the application is represented as a set of objects with some references to each other. Persistency means, that these objects can be saved to permanent storage i.e. a file and later the program will be able to restore the original state from that file. More precisely an application or data is persistent, if its lifetime is longer than the running time of the application.

When the application reads the file, the type of the objects have to be read from the file and a new object have to be created and initialized with the data read from the file. This is where we need RTTI. The basic process of saving and loading objects seems to be fairly simple, but if you consider some details and robustness, it became quite complicated.

The values of every data type should be transformed between its internal representation (binary) and file representation (text or binary). Who is responsible for this conversion? Generally the stream object has functions or operators for writing and reading every type to / from the stream. These functions or operators have an overloaded version for every type the application may want to save and load. Well, this is some kind of RTTI implemented with overloaded functions, but it does not solve the problem of creating new objects with the required type. We may assume, that the object already exists, and the stream object only has to fill the variables.

When all objects are created and all data are loaded only half of the job is done. Some objects may have variables that do not worth to save, because they can be computed from other variables. These variables have to be updated somehow. The references of objects to each other also have to be updated, because the objects are loaded to different addresses.

Finally letís consider error handling. What happens, if the stream is wrong, for example one value is missing, or the order of values is different? This is a common mistake, because different versions of application may have different data structure. It may also be required that the file format of different versions should be compatible, i.e. different versions should understand the file saved by previous versions. Moreover it would be nice, if an older version could be able to load a data file created by a newer version, make some changes and save it again without losing information. The older version cannot process and understand the new types and features, but it should be able to store them somewhere and save them again without knowing anything about their meaning.

There are many solutions and libraries available on the market for persistency, but all of them provide only the basic requirements. They are able to save and load objects, but they do not tolerate any mistakes in the stream. There are two common solutions:

1.      Objects have virtual methods for saving and loading the object. The argument is a reference to a stream. The stream has overloaded operators or functions for writing and reading the value of base types. It is the responsibility of the programmer how the variables are saved and loaded. For example Microsoft Foundation Class library provides persistency this way.

2.      Some libraries save and load the objectís memory to a binary file. These solutions require very little programming effort, but special tricks are used for validating memory addresses and virtual method tables. It is not possible to read or edit the data files. Any damage of data files leads to serious problems. Another drawback is that all variables are saved. There is no way to make difference between persistent and temporary data.

The authors have not seen any solution for persistency in C++, which provides readable data streams, robustness and tolerance in the structure of the stream. There are some implementations of properties and persistency in other systems and languages which worth to be mentioned here:

The detailed comparison and analyzes of these systems are going beyond the frame of this article. All of them have advantages and disadvantages, and as far as we know, none of them provides the required flexibility.

Applications of Persistency

Saving and Loading Applicationís Data

Saving documents in a text editor is a typical example.

Distributed applications

When different parts of the applications runs on different computers or nodes, data objects have to be sent through the network. Before sending the object it has to be packed to a message, and when the message has been delivered, the object has to be constructed again on the target node.

Saving and Loading the Current State of the Application

The end user wants to continue his job, where he has dropped. Therefore the program has to save all relevant information when the user exits, and it has to start in the same state when the user starts it next time.

Nowadays this problem is generally solved manually by writing the important variables to ini files or to the registry. Using persistent objects is more elegant and easier solution.

Configuring Applications

The real functionality of the program can be defined in configuration files. When the program is started persistent objects are loaded from the configuration file. These object determine how the program looks like and what it does. Different configuration files may lead to different applications. This topic leads to the next chapter, where a special program, the Application Generator is used to create such a configuration files.

Application Generators

What is an Application Generator? There are so many application generator program, that it is not easy to define what it exactly means. Letís highlight some features only, which are important for us.

Application Generators are program development tools making the program development process quick and easy. They provide a set of components and a nice graphical user interface, where someone can build an application by adding and configuring components.

The main advantage of such systems, that the developer does not need deep programming knowledge and the application can be built quickly. On the other hand it has two disadvantages: lower performance and limited capabilities.

Lower performance means that the same application developed in C++ could run much faster, because the internal communication between components is not efficient enough, or the developer cannot use a more efficient structure due to the limitation of the system. Nowadays this is not a big handicap. The speed of computers is growing, and the most expensive resource is the time of the software developers, but in some applications (embedded systems, data acquisition and control systems) speed is still important.

The limited capabilities are the most important disadvantage. All Application Generators are designed for a specific field like database application, data acquisition or graphical user interfaces. If the application needs a component, which is not available, the developer is in a big trouble. Sometimes the problem can be got round somehow, or the system may provide some support for writing application specific components, but these solutions destroy the original advantages. The worst is, that the development of a large application is not predictable. At the beginning it seems, that an Application Generator will fulfill all requirements, but when the application is almost ready the developers may realize that one important requirement cannot be implemented, just because a component is missing or it does not behave as it is expected.

Application Generator systems are very popular despite of the above-mentioned disadvantages. One of the reasons is that they provide the most reliable way of program development, the component based development process. This is probably the most important advantage. The application developer can work on a higher abstraction level and it is not required for him to go into the details of component implementation. The components are developed by experts and are tested in several applications.

The best Application Generator would combine the advantages of both ways of software development. Letís imagine a system, where the components are objects written in C++ and the Application Generator is a program, which represents the objects graphically, adds new objects and sets their properties. Some properties provide connections between objects, while others describe and determine how the object looks like and behave. The users of the Application Generator can work only with the abstraction level represented by the components, while a small team of C++ developers can make new components as required.

A C++ RTTI system would make possible to build such an Application Generator. The run time system has a set of objects having detailed RTTI information, and the application is built using these components. Another program, the Application Generator has access to the same set of components. It can investigate the object hierarchy of the run time system, add new objects and change the properties of them. A nice graphical user interface makes it easy to use while the full control over the components makes it very efficient.

There are 2 possible ways of the communication between the Application Generator and the Run Time System:

1.      The Application Generator may have the same set of components and it can build an object hierarchy alone. When the application is ready, it is saved to a file and the Run Time System will load it. This solution requires recompiling the Application Generator, when new components are added.

2.      The Runtime System may provide an interface for the Application Generator programs for accessing the internal object hierarchy. This makes possible to develop commercial Application Generator Programs, while the components stays proprietary. This interface is quite complicated and it has to have a mechanism for freezing the application while the Application Generator changes the structure of the program.

Both solutions have almost the same requirements for the RTTI system. Application Generators use persistency for saving and loading the components; therefore they have all the above-mentioned requirements against the RTTI system. Above of that the Application Generators need further features:

The requirements of these applications are similar. It is probably clear, what it is expected, but the next chapter will collect and discuss the requirements in detail.

Requirements

The previous chapter gave a short overview about Persistency and Application Generators from the RTTI systemís point of view. Some relations and requirements for RTTI systems were also discussed, as both Persistency and Application Generator programs need run-time type identification. Now these requirements will be clearly described in detail.

Shortly speaking, the requirements are the following:

The compiler can implement an RTTI system and probably this would be the simplest solution for the users, but it would make the C++ standard more complicated and probably would lead to arguments for the necessary features. Different applications may need slightly different RTTI systems and these differences cannot be covered by a standard RTTI system. The C++ language makes possible to write an RTTI system as a library. This solution requires some additional work and knowledge, but it is more flexible.

Types

The RTTI system describes both user-defined types and the built in types of the language, including the types defined in libraries and used in the applications, and types defined by the application itself. These types have 3 different groups:

Base Types

Base Types are not the same as the built in types of the language. All built in types are Base Type, but there are many other types handled as Base Types. Any type, structure or class may be described and handled as Base Type. For example strings can be handled as an array of characters, but the essence of the string is better represented, if it is described as Base Type.

The main point is that Base Types are the atomic components of the RTTI system. Base Types cannot be described as a composition of other types in contrast to compound and container types integrating several other members or elements for making a new types and new objects.

Compound Types

C++ structures and classes are Compound Types. The most important difference of Base Types and Compound Types is that the RTTI system has to describe the members of the Compound Types. Instead of writing new data conversion functions for classes, a description of the members is given.

A simple structure for describing colors is a good example. The Color structure has 3 integer members for the RGB components. The RTTI description of the color structure describes, that Color structure is a Compound Type and it has 3 integer type properties called R, G, and B. This way it is much simpler to make the type descriptor of Compound Types, than the Basic Types.

Container Types.

Container Types require special attention. Containers are special types storing several objects. STL containers (vector, list, set, map), and arrays are good examples of Container Types.

In addition the RTTI system has to be able to

The definition of type descriptor for a given container should be as simple as the type descriptor of Compound Types.

Interface

All the above-mentioned categories are different, but they must be described in a consistent framework. Therefore the RTTI system has to provide a consistent interface for accessing the type information and the members of a given object hierarchy. The interface must be independent of the actual type and structure of the objects and must be able to describe all features and possibilities of the C++ language. On the top of that it should be easy to use.

The RTTI system has to have an interface for getting the description of types, something similar to the type_id() operator. For example a function called GetTypeInfo() returns the address of the type descriptor of the given object. All types including base, compound and container types have one and only one type descriptor record, which have an interface for accessing all information stored by the RTTI system.

Even if this type description record exists for all types the compound and container types have to store some additional information about their members and they have to provide another interface for iterating through the tree of their members. The well-known iterators can be used here as well. Property Iterators can be created for traversing the object hierarchy, while the actual implementations of the iterators are hidden.

The interface of the RTTI system ensures that the Application Generator or the Persistent Streaming Library does not need to know anything about the C++ classes. They can get all necessary information through the RTTI interface and navigate through the object hierarchy by using Property Iterators.

The RTTI system consists of three parts:

The following chapters describe these parts.

RTTI description of base types

All types including base, compound, and container types must have a basic type description. This is a static object called Type Info record.

Every type of the application has a Type Info record. The Type Info records are instances of the Type Info classes. Every type has a Type Info class and that class has one and only one instance. The Type Info classes are written manually for Base Types and created automatically for Compound Types. The Type Info record provides the followings:

The Type Info record is making possible to use and investigate any given type. It is the base of the RTTI system and it must exist for all types the application wants to access at run time.

Description of compound and container types

Compound and container types require additional information above of the Type Info record. They are not simple types, where the GetVal() and SetVal() functions can handle a single value. They contain a list of other objects and the RTTI system has to provide a description of these objects. Compound Types contain members while Container Types contain elements. The type of the members and elements may be different as well, when the elements of the container are pointers to polymorph objects. These members and elements are called properties. Note, that not all member variable of a class are property, and not only member variables but also member functions can be defined as properties.

The number of properties is fixed for Compound Types, but for most Container Types it depends on the number of elements actually stored in the container. Therefore the RTTI system cannot depend on the number of properties to be consistent, but it has to iterate through them.

The description of the members can be implemented with an array of Property Descriptor records called Property Descriptor Table. Property Descriptors are simple data records describing a single member of a Compound Type. Members can be anything: a base type, a compound type, or a container.

The Property Descriptor has several sub-types depending on the type of the property it belongs to, but it provides the following common services:

The Property Descriptor is quite tricky. It has to hide all the differences of members and provide a common interface to any possible property type.

Property Iterators

The Type Info records and the Property Descriptor Tables store all information of the Run Time Type Identification system. The Property Iterators do not add new information just provide a well known and easy-to-use interface for the RTTI system.

Property Iterators can be created for pointing to any part of the object hierarchy and then the iterator can be used to traverse through the properties. Compound or container properties create new iterators for accessing their properties, so the new iterator opens a branch of the property tree. The member functions of Property Iterator make it possible to access all information and services of the Type Info and the Property Descriptor records.

On the top of the Property Iterators the Property Interface contains other functions. Some member functions are added to the classes having property description and some global, static functions are used to access the list of the Type Info Records. These functions are required for getting the first Property Iterator and for accessing the list of available types.

How to use the RTTI System?

Hopefully we already have some idea about the services and structure of an RTTI system. Before going into the details of implementation, letís see how it can be used for Persistency and Application Generators.

Persistency

The library implementing the RTTI system is called Property Library, because it implements properties for C++ objects. A new component of the system, the Stream Library, has to be introduced for Persistency. The Stream Library is responsible for handling the data stream or file. It uses the services of the Property Library for accessing the applicationís data and uses an internal representation of the data stream. Different implementations of the Stream Library can support different stream formats. The third part of the article describes the details of Stream Library.

It is one of the main advantages of this solution that the Property Library (the RTTI system), the Stream Library (the data stream) and the application are independent. Using the interface of the Property Library the Stream Library can save and load C++ objects without knowing their type. The actual format of the stream only depends on the Stream Library. It may support text, XML or binary formats, so the application selects the format of the stream when it is opened. Changing the file format does not require to change the source code of the applicationís classes, the RTTI description of them, nor the Property Library.

Saving Objects

First a Property Iterator pointing to the beginning (root) of the object hierarchy is created. It is passed to the Save() function of the Stream Library. The Stream Library does not know the type of the object, but it can use the Property Iterator for getting all necessary information. The Save() function iterates through the properties and decides if the type of the property is base type or not. If it is base type, the name, the type, and the value of the property is written to the stream. If it is a container or a compound type, the name and type of the property is written to the stream, and a new block of values is opened for the list of sub-properties. A text stream may look similar to a C program:

Obj1 = {
    int A = 23;
    int B = 46;
    RBG_c Color = {
        unsigned R = 255;
        unsigned G = 255;
        unsigned B = 255;
    }
}

This file is human readable and can be created or changed with a simple text editor. When the system is developed and the resource or data files can be changed only with a special editor, which is still under development, this feature is very important. Any bug can be fixed in the stream files and a missing feature of the editor does not delay the development of the application. Later, when the application and the editor are ready and tested, the stream format can be quickly and easily changed to a more efficient binary format.

Loading Objects

The Load() function of the Stream Library reads the type, name and value of the objects. It creates the object if it has not created before and reads its properties. Then it searches for the property by name and sets its value. Please, note, that the stream drives the reading sequence and not the program or the structure of the class! Therefore the loading process tolerates if some value is missing or the order of values is different. The Stream Library may be able to handle unknown properties, which makes possible to load streams created by a newer version of the program.

References or Pointers

References or pointers require special attention. When the objects are loaded they are placed to a different address, while the references (pointers) contain the addresses of the objects when they were saved. The Stream Library has to build an address translation table, and replace all address with the correct values.

The most critical part of the Stream Library is how it handles the pointers. When the objects are saved all objects have to be saved once even if several pointers reference it. The objects are loaded first, and then the references are resolved by using the address translation table. This algorithm can solve circular references of objects. A circular reference happens when several objects have pointers to each other. For example object A points to object B, it points to object C, and object C has a pointer to object A.

Default Values, and Validation

When a variable is not a property or the value of a property is missing from the stream, the variable will not be initialized when the object is loaded from the stream. Therefore the developer of the class must pay special attention for initializing every variable in the constructors.

The Stream Library may support another feature for handling these variables. Every class may have a virtual function for validating the object. The Stream Library can call these Validate() functions at the end of the loading process for checking the validity of objects and giving a chance for the object to set some un-initialized variables.

Application Generator

The Application Generator program probably uses the Stream Library for saving and loading the created and edited application. It also uses the Property Interface for iterating through the object hierarchy, displays the properties for the user and makes possible to view and edit them. The Application Generator is also able to create new objects and insert them into any extendable container.

For providing the list of available object types the Application Generator has to access the Type Info records of the application and display the types in a list. A container probably cannot store any kind of object therefore the list must be filtered for the element type and its descendants. The list of types can be displayed as a tree of types regarding the object hierarchy.

The user interface displays a tree representation of the object hierarchy. This is a common representation, but if the Application Generator knows more about the objects and the meaning of their properties much more sophisticated representations can be provided. For example graphical objects (windows, buttons, input lines) can be represented in a dialog editor, or some relation of objects represented by pointers can be displayed graphically. These are just some simple examples of the infinite possibilities.

The Application Generator saves the application to file, and the file can be loaded by the run-time system. This way the developed application became independent of the Application Generator.

Conclusions

The first part introduced the Run Time Type Identification system implemented by the Property Library. All the requirements are collected and discussed in details. It was shown, how application data could be saved and loaded by the Property and Stream Library without knowing anything about the applicationís classes itself.

The second part of the article will describe how the Property Library is implemented, while the third part describes the Stream Library. The following parts of the article go into the details, and describe many programming tricks used for getting a clear system. The reader will need C++ programming knowledge to understand it.

 

Part II.

http://www.rcs.hu/Articles/RTTI_Part2.htm

 

Part III.

http://www.rcs.hu/Articles/RTTI_Part3.htm

 

References

1.      Bjarne Stroustrup: The C++ Programming Language Special Edition, AT&T, 2000.

2.      Paul Jakubik: Callback Implementations in C++, http://www.primenet.com/~jakubik/callback.html

3.      Vladimir Batov: Persistency Made Easy, C++ Report, August 12, 2002. http://www.adtmag.com/joop/crarticle.asp?ID=849 originally appeared in the August 2000 issue of Journal of C++ Report.