How to work around lack of NUL terminator in strings returned from mmap()?

I would suggest undergoing a paradigm shift here.

You're looking at the entire universe consisting of ''-delimited strings that define your text. Instead of looking at the world this way, why don't you try looking at the world where text is defined as a sequence defined by a beginning and an ending iterator.

You mmap your file, then initially set the beginning iterator, call it beg_iter to the start of the mmap-ed segment, and the ending iterator, call it end_iter, to the first byte following the last byte in the mmap-ed segment, or beg_iter+number_of_pages*pagesize, then until either

A) end_iter equals beg_iter, or

B) beg_iter[-1] is not a null character, then

C) decrement end_iter, and go back to step A.

When you're done, you have a pair of iterators, the beginning iterator value, and the ending iterator value that define your text string.

Of course, in this case, your iterators are plain char *, but that's really not very important. What is important is that now you find yourself with a rich set of algorithms and templates from the C++ standard library at your disposal, that let you implement many complicated operations, both mutable (like std::transform), and non-mutable, (like std::find).

Null-terminated strings are really a holdover from the days of plain C. With C++, null-terminated strings are somewhat archaic, and mundane. Modern C++ code should use std::string objects, and sequences defined by beginning and ending iterators.

One small footnote: instead of figuring out how much NULL padding you ended up mmap-ing(), you might find it easier to fstat() the file, and get the file's exact length, in bytes, before mmap-ing it. Then you'll now exactly know much got mmaped, and you don't have to reverse-engineer it, by looking at the padding.

