| From: herbs@cntc.com (Herb Sutter) |
| Subject: Guru of the Week #29: Solution |
| Date: 22 Jan 1998 00:00:00 GMT |
| Message-ID: <6a8q26$9qa@netlab.cs.rpi.edu> |
| Newsgroups: comp.lang.c++.moderated |
| |
| |
| .--------------------------------------------------------------------. |
| | Guru of the Week problems and solutions are posted regularly on | |
| | news:comp.lang.c++.moderated. For past problems and solutions | |
| | see the GotW archive at http://www.cntc.com. | |
| | Is there a topic you'd like to see covered? mailto:herbs@cntc.com | |
| `--------------------------------------------------------------------' |
| _______________________________________________________ |
| |
| GotW #29: Strings |
| |
| Difficulty: 7 / 10 |
| _______________________________________________________ |
| |
| |
| >Write a ci_string class which is identical to the |
| >standard 'string' class, but is case-insensitive in the |
| >same way as the C function stricmp(): |
| |
| The "how can I make a case-insensitive string?" |
| question is so common that it probably deserves its own |
| FAQ -- hence this issue of GotW. |
| |
| Note 1: The stricmp() case-insensitive string |
| comparison function is not part of the C standard, but |
| it is a common extension on many C compilers. |
| |
| Note 2: What "case insensitive" actually means depends |
| entirely on your application and language. For |
| example, many languages do not have "cases" at all, and |
| for languages that do you have to decide whether you |
| want accented characters to compare equal to unaccented |
| characters, and so on. This GotW provides guidance on |
| how to implement case-insensitivity for standard |
| strings in whatever sense applies to your particular |
| situation. |
| |
| |
| Here's what we want to achieve: |
| |
| > ci_string s( "AbCdE" ); |
| > |
| > // case insensitive |
| > assert( s == "abcde" ); |
| > assert( s == "ABCDE" ); |
| > |
| > // still case-preserving, of course |
| > assert( strcmp( s.c_str(), "AbCdE" ) == 0 ); |
| > assert( strcmp( s.c_str(), "abcde" ) != 0 ); |
| |
| The key here is to understand what a "string" actually |
| is in standard C++. If you look in your trusty string |
| header, you'll see something like this: |
| |
| typedef basic_string<char> string; |
| |
| So string isn't really a class... it's a typedef of a |
| template. In turn, the basic_string<> template is |
| declared as follows, in all its glory: |
| |
| template<class charT, |
| class traits = char_traits<charT>, |
| class Allocator = allocator<charT> > |
| class basic_string; |
| |
| So "string" really means "basic_string<char, |
| char_traits<char>, allocator<char> >". We don't need |
| to worry about the allocator part, but the key here is |
| the char_traits part because char_traits defines how |
| characters interact and compare(!). |
| |
| basic_string supplies useful comparison functions that |
| let you compare whether a string is equal to another, |
| less than another, and so on. These string comparisons |
| functions are built on top of character comparison |
| functions supplied in the char_traits template. In |
| particular, the char_traits template supplies character |
| comparison functions named eq(), ne(), and lt() for |
| equality, inequality, and less-than comparisons, and |
| compare() and find() functions to compare and search |
| sequences of characters. |
| |
| If we want these to behave differently, all we have to |
| do is provide a different char_traits template! Here's |
| the easiest way: |
| |
| struct ci_char_traits : public char_traits<char> |
| // just inherit all the other functions |
| // that we don't need to override |
| { |
| static bool eq( char c1, char c2 ) { |
| return tolower(c1) == tolower(c2); |
| } |
| |
| static bool ne( char c1, char c2 ) { |
| return tolower(c1) != tolower(c2); |
| } |
| |
| static bool lt( char c1, char c2 ) { |
| return tolower(c1) < tolower(c2); |
| } |
| |
| static int compare( const char* s1, |
| const char* s2, |
| size_t n ) { |
| return strnicmp( s1, s2, n ); |
| // if available on your compiler, |
| // otherwise you can roll your own |
| } |
| |
| static const char* |
| find( const char* s, int n, char a ) { |
| while( n-- > 0 && tolower(*s) != tolower(a) ) { |
| ++s; |
| } |
| return s; |
| } |
| }; |
| |
| And finally, the key that brings it all together: |
| |
| typedef basic_string<char, ci_char_traits> ci_string; |
| |
| All we've done is created a typedef named "ci_string" |
| which operates exactly like the standard "string", |
| except that it uses ci_char_traits instead of |
| char_traits<char> to get its character comparison |
| rules. Since we've handily made the ci_char_traits |
| rules case-insensitive, we've made ci_string itself |
| case-insensitive without any further surgery -- that |
| is, we have a case-insensitive string without having |
| touched basic_string at all! |
| |
| This GotW should give you a flavour for how the |
| basic_string template works and how flexible it is in |
| practice. If you want different comparisons than the |
| ones stricmp() and tolower() give you, just replace the |
| five functions shown above with your own code that |
| performs character comparisons the way that's |
| appropriate in your particular application. |
| |
| |
| |
| Exercise for the reader: |
| |
| Is it safe to inherit ci_char_traits from |
| char_traits<char> this way? Why or why not? |
| |
| |