For clarity I'm only talking about null terminated strings.
I'm familiar with the standard way of doing string comparisons in C with the usage of strcmp. But I feel like it's slow and inefficient.
I'm not necessarily looking for the easiest method but the most efficient.
Can the current comparison method (strcmp) be optimized further while the underlying code remains cross platform?
If strcmp can't be optimized further, what is the fastest way which I could perform the string comparison without strcmp?
Current use case:
- Determine if two arbitrary strings match
- Strings will not exceed 4096 bytes, nor be less than 1 byte in size
- Strings are allocated/deallocated and compared within the same code/library
- Once comparison is complete I do pass the string to another C library which needs the format to be in a standard null terminated format
- System memory limits are not a huge concern, but I will have tens of thousands of such strings queued up for comparison
- Strings may contain high-ascii character set or UTF-8 characters but for my purposes I only need to know if they match, content is not a concern
- Application runs on x86 but should also run on x64
Reference to current strcmp() implementation:
Edit: Clarified the solution does not need to be a modification of strcmp.
Edit 2: Added specific examples for this use case.
I'm afraid your reference imlementation for
strcmp()
is both inaccurate and irrelevant:it is inaccurate because it compares characters using the
char
type instead of theunsigned char
type as specified in the C11 Standard:It is irrelevant because the actual implementation used by modern compilers is much more sophisticated, expanded inline using hand-coded assembly language.
Any generic implementation is likely to be less optimal, especially if coded to remain portable across platforms.
Here are a few directions to explore if your program's bottleneck is comparing strings.
memcmp()
instead ofstrcmp()
.memcmp()
is simpler thanstrcmp()
and can be implemented even more efficiently in places where the strings are known to be properly aligned.EDIT: with the extra information provided, you could use a structure like this for your strings:
You allocate this structure this way:
If you can use these str things for all your strings, you can greatly improve the efficiency of the matching by first comparing the lengths or the hashes. You can still pass the
str
member to your library function, it is properly null terminated.