//...
if( strcmp( str, "January" ) == 0 )
month = 1;
else if( strcmp( str, "February") == 0 )
month = 2;
//...
Q: Is there any more efficient way of determining that, for instance, "April" is the fourth month of the year? Repeated calls to strcmp() must be terribly inefficient, and if/else ladders tedious to code. Sometimes it's "March" and sometimes it's abbreviated as "MAR"... There must be a better way...
Putting the known strings in a sorted array of structs would allow, at least, binary searching, but still involves a lot of guesswork from the code.
This is a Can I answer my own question? answer. Other answers are welcomed.
There are several ways of translating an arbitrary string from a finite set of strings into a concise, useable form. Most of these involve an iterative (or a sub-optimal linear) search involving repeated comparisons (that may need to account for case sensitivity.)
A response to my response to a recent question suggested "sharing" an (admittedly arcane) hashing function that, with awareness of false positives, return the month ordinal (1-12) when passed a string containing the name of a month (English) in 7-bit ASCII. The function performs primitive operations on the 2nd & 3rd character and out-pops the function's hash value of the string. Note, "January", "jan" and "JAN" all return the value 1. Likewise "feb", "FEBRUARY" and "Feb" would return the value 2.
The operations shown were uncovered through a "brute force" permutation of a number of primitive operations seeking a combination that would return 12 different values between 0x0 and 0xF (4 bits). The reader is encouraged to take apart each step of the mangling of the bits of the two ASCII characters. This result was not "invented", but was "discovered".
After the bits of two characters have been mangled, the value is used as an index into a string (aka "a cheap LUT") whose 12 letters A-L are positioned so that "?an" (January) will mangle to an index for the letter 'A'. Masking the low 4 bits of that letter yields the value 1 as the ordinal for the string "JANUARY"... 1 will be the return value when the function is passed variations of the string "Jan".
NB: Using this function allows the caller to check that the string is indeed "JAN", "jan", "January" as suits the application. The caller need not try to match any of the names of the other 11 months. This function WILL return the false positive value 1 for the string "Random", so the caller need only validate against a single month's name (length and case appropriate to the application.)
Bonus round:
An equivalent function that converts "Sun(day)" (case insensitive) to 1, "MON" to 2, "tue" to 3, etc...
Again, the caller must confirm the string against only ONE day's name to avoid "false positives".
While we're here, the following is an equivalent function for the "number names" from "zero" to "ten", again, case insensitive. (Number names are not abbreviated like month names or weekday names.)
EDIT
To counter the nay-sayers, here's another one:
Pass it the name (case ambivalent) of one of the twelve Zodiac signs, and it will return the ordinal of that star sign ("Aries" = 1, ...) Again, like any hashing function, there will be collisions with other strings. The caller need only subsequently check against a single known string; not twelve.
Update:
Drawn back to this for a recent task, an improvement to the month ordinal conversion revealed itself. The code above uses 16(+1) bytes for its LUT. The code below halves that amount. For posterity, the scheme is explained, with examples provided.
NB: Above returns
1-12for valid a month name (or false positive) and0for other hash values in the range0-15. Below returns0-11for valid month names (conforming tostruct tmconvention) and12for each of the four other hashes in the range0-15. False positives don't magically go away.Output:
UPDATED Update:
Inspired by the clarity of code in the recent answer by @chrqlie, there's this... It compiles to the same amount of assembler, but doesn't have the baggage of additional data. (Requiring a 64bit machine, it's probably not suitable for MCUs.) Again, the size of the LUT is only 8 bytes...
Further update: The first version of
monthOrd()calculated a 4 bit hash and returned0for 4/16 cases where the passed string bore no resemblance to the name of a month. (It returned1-12for 12/16 cases that hashed to match the name of a particular month.) This was 'improved' in a later version, returning0-11to follow the convention ofstruct tm. Unfortunately, this meant the caller would have to usestrcmp()to discern"Jan(uary)"(0) from the 4 wannabees (also0.) Not good!The most recent version (just edited) still returns the desired
0-11for the 12 hashed month names (including false positives, if those are a possibility). The edited version now returns-1for the 25% of the cases where the string could not possibly be the name of a month (ie. the string's hash value shows it to be a wannnabee.)The caller need not invoke
strcmp()whenmonthOrd( str )returns-1. Shaving cycles...