| <?xml version="1.0" encoding="ISO-8859-1"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
| "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> |
| <meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)" /> |
| <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" /> |
| <meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 23." /> |
| <meta name="GENERATOR" content="vi and eight fingers" /> |
| <title>libstdc++-v3 HOWTO: Chapter 23: Containers</title> |
| <link rel="StyleSheet" href="../lib3styles.css" type="text/css" /> |
| <link rel="Start" href="../documentation.html" type="text/html" |
| title="GNU C++ Standard Library" /> |
| <link rel="Prev" href="../22_locale/howto.html" type="text/html" |
| title="Localization" /> |
| <link rel="Next" href="../24_iterators/howto.html" type="text/html" |
| title="Iterators" /> |
| <link rel="Copyright" href="../17_intro/license.html" type="text/html" /> |
| <link rel="Help" href="../faq/index.html" type="text/html" title="F.A.Q." /> |
| </head> |
| <body> |
| |
| <h1 class="centered"><a name="top">Chapter 23: Containers</a></h1> |
| |
| <p>Chapter 23 deals with container classes and what they offer. |
| </p> |
| |
| |
| <!-- ####################################################### --> |
| <hr /> |
| <h1>Contents</h1> |
| <ul> |
| <li><a href="#1">Making code unaware of the container/array difference</a></li> |
| <li><a href="#2">Variable-sized bitmasks</a></li> |
| <li><a href="#3">Containers and multithreading</a></li> |
| <li><a href="#4">"Hinting" during insertion</a></li> |
| <li><a href="#5">Bitmasks and string arguments</a></li> |
| <li><a href="#6"><code>std::list::size()</code> is O(n)!</a></li> |
| <li><a href="#7">Space overhead management for vectors</a></li> |
| </ul> |
| |
| <hr /> |
| |
| <!-- ####################################################### --> |
| |
| <h2><a name="1">Making code unaware of the container/array difference</a></h2> |
| <p>You're writing some code and can't decide whether to use builtin |
| arrays or some kind of container. There are compelling reasons |
| to use one of the container classes, but you're afraid that you'll |
| eventually run into difficulties, change everything back to arrays, |
| and then have to change all the code that uses those data types to |
| keep up with the change. |
| </p> |
| <p>If your code makes use of the standard algorithms, this isn't as |
| scary as it sounds. The algorithms don't know, nor care, about |
| the kind of "container" on which they work, since the |
| algorithms are only given endpoints to work with. For the container |
| classes, these are iterators (usually <code>begin()</code> and |
| <code>end()</code>, but not always). For builtin arrays, these are |
| the address of the first element and the |
| <a href="../24_iterators/howto.html#2">past-the-end</a> element. |
| </p> |
| <p>Some very simple wrapper functions can hide all of that from the |
| rest of the code. For example, a pair of functions called |
| <code>beginof</code> can be written, one that takes an array, another |
| that takes a vector. The first returns a pointer to the first |
| element, and the second returns the vector's <code>begin()</code> |
| iterator. |
| </p> |
| <p>The functions should be made template functions, and should also |
| be declared inline. As pointed out in the comments in the code |
| below, this can lead to <code>beginof</code> being optimized out of |
| existence, so you pay absolutely nothing in terms of increased |
| code size or execution time. |
| </p> |
| <p>The result is that if all your algorithm calls look like |
| </p> |
| <pre> |
| std::transform(beginof(foo), endof(foo), beginof(foo), SomeFunction);</pre> |
| <p>then the type of foo can change from an array of ints to a vector |
| of ints to a deque of ints and back again, without ever changing any |
| client code. |
| </p> |
| <p>This author has a collection of such functions, called "*of" |
| because they all extend the builtin "sizeof". It started |
| with some Usenet discussions on a transparent way to find the length |
| of an array. A simplified and much-reduced version for easier |
| reading is <a href="wrappers_h.txt">given here</a>. |
| </p> |
| <p>Astute readers will notice two things at once: first, that the |
| container class is still a <code>vector<T></code> instead of a |
| more general <code>Container<T></code>. This would mean that |
| three functions for <code>deque</code> would have to be added, another |
| three for <code>list</code>, and so on. This is due to problems with |
| getting template resolution correct; I find it easier just to |
| give the extra three lines and avoid confusion. |
| </p> |
| <p>Second, the line |
| </p> |
| <pre> |
| inline unsigned int lengthof (T (&)[sz]) { return sz; } </pre> |
| <p>looks just weird! Hint: unused parameters can be left nameless. |
| </p> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| <hr /> |
| <h2><a name="2">Variable-sized bitmasks</a></h2> |
| <p>No, you cannot write code of the form |
| </p> |
| <!-- Careful, the leading spaces in PRE show up directly. --> |
| <pre> |
| #include <bitset> |
| |
| void foo (size_t n) |
| { |
| std::bitset<n> bits; |
| .... |
| } </pre> |
| <p>because <code>n</code> must be known at compile time. Your compiler is |
| correct; it is not a bug. That's the way templates work. (Yes, it |
| <em>is</em> a feature.) |
| </p> |
| <p>There are a couple of ways to handle this kind of thing. Please |
| consider all of them before passing judgement. They include, in |
| no particular order: |
| </p> |
| <ul> |
| <li>A very large N in <code>bitset<N></code>.</li> |
| <li>A container<bool>.</li> |
| <li>Extremely weird solutions.</li> |
| </ul> |
| <p><strong>A very large N in |
| <code>bitset<N></code>. </strong> It has |
| been pointed out a few times in newsgroups that N bits only takes up |
| (N/8) bytes on most systems, and division by a factor of eight is pretty |
| impressive when speaking of memory. Half a megabyte given over to a |
| bitset (recall that there is zero space overhead for housekeeping info; |
| it is known at compile time exactly how large the set is) will hold over |
| four million bits. If you're using those bits as status flags (e.g., |
| "changed"/"unchanged" flags), that's a <em>lot</em> |
| of state. |
| </p> |
| <p>You can then keep track of the "maximum bit used" during some |
| testing runs on representative data, make note of how many of those bits |
| really need to be there, and then reduce N to a smaller number. Leave |
| some extra space, of course. (If you plan to write code like the |
| incorrect example above, where the bitset is a local variable, then you |
| may have to talk your compiler into allowing that much stack space; |
| there may be zero space overhead, but it's all allocated inside the |
| object.) |
| </p> |
| <p><strong>A container<bool>. </strong> The Committee |
| made provision |
| for the space savings possible with that (N/8) usage previously mentioned, |
| so that you don't have to do wasteful things like |
| <code>Container<char></code> or |
| <code>Container<short int></code>. |
| Specifically, <code>vector<bool></code> is required to be |
| specialized for that space savings. |
| </p> |
| <p>The problem is that <code>vector<bool></code> doesn't behave like a |
| normal vector anymore. There have been recent journal articles which |
| discuss the problems (the ones by Herb Sutter in the May and |
| July/August 1999 issues of |
| <u>C++ Report</u> cover it well). Future revisions of the ISO C++ |
| Standard will change the requirement for <code>vector<bool></code> |
| specialization. In the meantime, <code>deque<bool></code> is |
| recommended (although its behavior is sane, you probably will not get |
| the space savings, but the allocation scheme is different than that |
| of vector). |
| </p> |
| <p><strong>Extremely weird solutions. </strong> If you have |
| access to |
| the compiler and linker at runtime, you can do something insane, like |
| figuring out just how many bits you need, then writing a temporary |
| source code file. That file contains an instantiation of |
| <code>bitset</code> |
| for the required number of bits, inside some wrapper functions with |
| unchanging signatures. Have your program then call the |
| compiler on that file using Position Independent Code, then open the |
| newly-created object file and load those wrapper functions. You'll have |
| an instantiation of <code>bitset<N></code> for the exact |
| <code>N</code> |
| that you need at the time. Don't forget to delete the temporary files. |
| (Yes, this <em>can</em> be, and <em>has been</em>, done.) |
| </p> |
| <!-- I wonder if this next paragraph will get me in trouble... --> |
| <p>This would be the approach of either a visionary genius or a raving |
| lunatic, depending on your programming and management style. Probably |
| the latter. |
| </p> |
| <p>Which of the above techniques you use, if any, are up to you and your |
| intended application. Some time/space profiling is indicated if it |
| really matters (don't just guess). And, if you manage to do anything |
| along the lines of the third category, the author would love to hear |
| from you... |
| </p> |
| <p>Also note that the implementation of bitset used in libstdc++-v3 has |
| <a href="../ext/sgiexts.html#ch23">some extensions</a>. |
| </p> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| <hr /> |
| <h2><a name="3">Containers and multithreading</a></h2> |
| <p>This section discusses issues surrounding the design of |
| multithreaded applications which use Standard C++ containers. |
| All information in this section is current as of the gcc 3.0 |
| release and all later point releases. Although earlier gcc |
| releases had a different approach to threading configuration and |
| proper compilation, the basic code design rules presented here |
| were similar. For information on all other aspects of |
| multithreading as it relates to libstdc++, including details on |
| the proper compilation of threaded code (and compatibility between |
| threaded and non-threaded code), see Chapter 17. |
| </p> |
| <p>Two excellent pages to read when working with the Standard C++ |
| containers and threads are |
| <a href="http://www.sgi.com/tech/stl/thread_safety.html">SGI's |
| http://www.sgi.com/tech/stl/thread_safety.html</a> and |
| <a href="http://www.sgi.com/tech/stl/Allocators.html">SGI's |
| http://www.sgi.com/tech/stl/Allocators.html</a>. |
| </p> |
| <p><em>However, please ignore all discussions about the user-level |
| configuration of the lock implementation inside the STL |
| container-memory allocator on those pages. For the sake of this |
| discussion, libstdc++-v3 configures the SGI STL implementation, |
| not you. This is quite different from how gcc pre-3.0 worked. |
| In particular, past advice was for people using g++ to |
| explicitly define _PTHREADS or other macros or port-specific |
| compilation options on the command line to get a thread-safe |
| STL. This is no longer required for any port and should no |
| longer be done unless you really know what you are doing and |
| assume all responsibility.</em> |
| </p> |
| <p>Since the container implementation of libstdc++-v3 uses the SGI |
| code, we use the same definition of thread safety as SGI when |
| discussing design. A key point that beginners may miss is the |
| fourth major paragraph of the first page mentioned above |
| ("For most clients,"...), which points out that |
| locking must nearly always be done outside the container, by |
| client code (that'd be you, not us). There is a notable |
| exceptions to this rule. Allocators called while a container or |
| element is constructed uses an internal lock obtained and |
| released solely within libstdc++-v3 code (in fact, this is the |
| reason STL requires any knowledge of the thread configuration). |
| </p> |
| <p>For implementing a container which does its own locking, it is |
| trivial to provide a wrapper class which obtains the lock (as |
| SGI suggests), performs the container operation, and then |
| releases the lock. This could be templatized <em>to a certain |
| extent</em>, on the underlying container and/or a locking |
| mechanism. Trying to provide a catch-all general template |
| solution would probably be more trouble than it's worth. |
| </p> |
| <p>The STL implementation is currently configured to use the |
| high-speed caching memory allocator. Some people like to |
| test and/or normally run threaded programs with a different |
| default. For all details about how to globally override this |
| at application run-time see <a href="../ext/howto.html#3">here</a>. |
| </p> |
| <p>There is a better way (not standardized yet): It is possible to |
| force the malloc-based allocator on a per-case-basis for some |
| application code. The library team generally believes that this |
| is a better way to tune an application for high-speed using this |
| implementation of the STL. There is |
| <a href="../ext/howto.html#3">more information on allocators here</a>. |
| </p> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| <hr /> |
| <h2><a name="4">"Hinting" during insertion</a></h2> |
| <p>Section [23.1.2], Table 69, of the C++ standard lists this function |
| for all of the associative containers (map, set, etc): |
| </p> |
| <pre> |
| a.insert(p,t);</pre> |
| <p>where 'p' is an iterator into the container 'a', and 't' is the item |
| to insert. The standard says that "iterator p is a hint |
| pointing to where the insert should start to search," but |
| specifies nothing more. (LWG Issue #233, currently in review, |
| addresses this topic, but I will ignore it here because it is not yet |
| finalized.) |
| </p> |
| <p>Here we'll describe how the hinting works in the libstdc++-v3 |
| implementation, and what you need to do in order to take advantage of |
| it. (Insertions can change from logarithmic complexity to amortized |
| constant time, if the hint is properly used.) Also, since the current |
| implementation is based on the SGI STL one, these points may hold true |
| for other library implementations also, since the HP/SGI code is used |
| in a lot of places. |
| </p> |
| <p>In the following text, the phrases <em>greater than</em> and <em>less |
| than</em> refer to the results of the strict weak ordering imposed on |
| the container by its comparison object, which defaults to (basically) |
| "<". Using those phrases is semantically sloppy, but I |
| didn't want to get bogged down in syntax. I assume that if you are |
| intelligent enough to use your own comparison objects, you are also |
| intelligent enough to assign "greater" and "lesser" |
| their new meanings in the next paragraph. *grin* |
| </p> |
| <p>If the <code>hint</code> parameter ('p' above) is equivalent to: |
| </p> |
| <ul> |
| <li><code>begin()</code>, then the item being inserted should have a key |
| less than all the other keys in the container. The item will |
| be inserted at the beginning of the container, becoming the new |
| entry at <code>begin()</code>. |
| </li> |
| <li><code>end()</code>, then the item being inserted should have a key |
| greater than all the other keys in the container. The item will |
| be inserted at the end of the container, becoming the new entry |
| at <code>end()</code>. |
| </li> |
| <li>neither <code>begin()</code> nor <code>end()</code>, then: Let <code>h</code> |
| be the entry in the container pointed to by <code>hint</code>, that |
| is, <code>h = *hint</code>. Then the item being inserted should have |
| a key less than that of <code>h</code>, and greater than that of the |
| item preceding <code>h</code>. The new item will be inserted |
| between <code>h</code> and <code>h</code>'s predecessor. |
| </li> |
| </ul> |
| <p>For <code>multimap</code> and <code>multiset</code>, the restrictions are |
| slightly looser: "greater than" should be replaced by |
| "not less than" and "less than" should be replaced |
| by "not greater than." (Why not replace greater with |
| greater-than-or-equal-to? You probably could in your head, but the |
| mathematicians will tell you that it isn't the same thing.) |
| </p> |
| <p>If the conditions are not met, then the hint is not used, and the |
| insertion proceeds as if you had called <code> a.insert(t) </code> |
| instead. (<strong>Note </strong> that GCC releases prior to 3.0.2 |
| had a bug in the case with <code>hint == begin()</code> for the |
| <code>map</code> and <code>set</code> classes. You should not use a hint |
| argument in those releases.) |
| </p> |
| <p>This behavior goes well with other container's <code>insert()</code> |
| functions which take an iterator: if used, the new item will be |
| inserted before the iterator passed as an argument, same as the other |
| containers. The exception |
| (in a sense) is with a hint of <code>end()</code>: the new item will |
| actually be inserted after <code>end()</code>, but it also becomes the |
| new <code>end()</code>. |
| </p> |
| <p><strong>Note </strong> also that the hint in this implementation is a |
| one-shot. The insertion-with-hint routines check the immediately |
| surrounding entries to ensure that the new item would in fact belong |
| there. If the hint does not point to the correct place, then no |
| further local searching is done; the search begins from scratch in |
| logarithmic time. (Further local searching would only increase the |
| time required when the hint is too far off.) |
| </p> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| <hr /> |
| <h2><a name="5">Bitmasks and string arguments</a></h2> |
| <p>Bitmasks do not take char* nor const char* arguments in their |
| constructors. This is something of an accident, but you can read |
| about the problem: follow the library's "Links" from the |
| homepage, and from the C++ information "defect reflector" |
| link, select the library issues list. Issue number 116 describes the |
| problem. |
| </p> |
| <p>For now you can simply make a temporary string object using the |
| constructor expression: |
| </p> |
| <pre> |
| std::bitset<5> b ( std::string("10110") ); |
| </pre> |
| instead of |
| <pre> |
| std::bitset<5> b ( "10110" ); // invalid |
| </pre> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| <hr /> |
| <h2><a name="6"><code>std::list::size()</code> is O(n)!</a></h2> |
| <p>Yes it is, and that's okay. This is a decision that we preserved when |
| we imported SGI's STL implementation. The following is quoted from |
| <a href="http://www.sgi.com/tech/stl/FAQ.html">their FAQ</a>: |
| </p> |
| <blockquote> |
| <p>The size() member function, for list and slist, takes time |
| proportional to the number of elements in the list. This was a |
| deliberate tradeoff. The only way to get a constant-time size() for |
| linked lists would be to maintain an extra member variable containing |
| the list's size. This would require taking extra time to update that |
| variable (it would make splice() a linear time operation, for example), |
| and it would also make the list larger. Many list algorithms don't |
| require that extra word (algorithms that do require it might do better |
| with vectors than with lists), and, when it is necessary to maintain |
| an explicit size count, it's something that users can do themselves. |
| </p> |
| <p>This choice is permitted by the C++ standard. The standard says that |
| size() "should" be constant time, and "should" |
| does not mean the same thing as "shall". This is the |
| officially recommended ISO wording for saying that an implementation |
| is supposed to do something unless there is a good reason not to. |
| </p> |
| <p>One implication of linear time size(): you should never write |
| </p> |
| <pre> |
| if (L.size() == 0) |
| ...</pre> |
| Instead, you should write |
| <pre> |
| if (L.empty()) |
| ...</pre> |
| </blockquote> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| <hr /> |
| <h2><a name="7">Space overhead management for vectors</a></h2> |
| <p>In |
| <a href="http://gcc.gnu.org/ml/libstdc++/2002-04/msg00105.html">this |
| message to the list</a>, Daniel Kostecky announced work on an |
| alternate form of <code>std::vector</code> that would support hints |
| on the number of elements to be over-allocated. The design was also |
| described, along with possible implementation choices. |
| </p> |
| <p>The first two alpha releases were announced |
| <a href="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00048.html">here</a> |
| and |
| <a href="http://gcc.gnu.org/ml/libstdc++/2002-07/msg00111.html">here</a>. |
| The releases themselves are available at |
| <a href="http://www.kotelna.sk/dk/sw/caphint/"> |
| http://www.kotelna.sk/dk/sw/caphint/</a>. |
| </p> |
| <p>Return <a href="#top">to top of page</a> or |
| <a href="../faq/index.html">to the FAQ</a>. |
| </p> |
| |
| |
| <!-- ####################################################### --> |
| |
| <hr /> |
| <p class="fineprint"><em> |
| See <a href="../17_intro/license.html">license.html</a> for copying conditions. |
| Comments and suggestions are welcome, and may be sent to |
| <a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>. |
| </em></p> |
| |
| |
| </body> |
| </html> |