c++ - UTF8 data to std::string or std::wstring -
i receive body bytes http server response , dont know how convert them utf8 string work them.
i have idea not sure wheter works. need bytes of response , search on them , modify them, need transform std::vector<byte>
std::wstring
or std::string
.
the bytes encoding in utf8 of response in std::vector<byte>
, how can transform them std::string
? shall transform them std::wstring
?.
i found code:
std::string encoding::stringtoutf8(const std::string& str) { int size = multibytetowidechar(cp_acp, mb_composite, str.c_str(), str.length(), null, 0); std::wstring utf16_str(size, '\0'); multibytetowidechar(cp_acp, mb_composite, str.c_str(), str.length(), &utf16_str[0], size); int utf8_size = widechartomultibyte(cp_utf8, 0, utf16_str.c_str(), utf16_str.length(), null, 0, null, null); std::string utf8_str(utf8_size, '\0'); widechartomultibyte(cp_utf8, 0, utf16_str.c_str(), utf16_str.length(), &utf8_str[0], utf8_size, null, null); return utf8_str;
}
but if want search character "Ñ" in string work?, or have transform bytes in std::wstring
, search "Ñ" modify std::wstring
, convert std::string
?
which of 2 correct?
i need put utf8 response in std::string
or std::wstring
in order search , modify data (with special characters) , resend response client in utf8.
storing utf-8 in std::string
no more storing sequence of bytes in "vector". std::string
not aware of encoding stuff whatsoever, , member function find
or <algorithm>
function std::find
not work once need work beyond standard ascii. how gonna handle situation, can try , convert input (l"Ñ"
) utf-8 sequence , try find in std::string
or can convert string
wstring
, work directly on it. imho, in case when have manipulate (search, extract words, split letters or replace, , beyond ascii range) input better stick wstring
, before posting client convert utf-8 std::string
edit001: of std::codecvt_utf8
mentioned above in comment , comment performance concerns. here test
std::wstring foo(const std::string& input) { std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; return converter.from_bytes(input.c_str()); } std::wstring baz(const std::string& input) { std::wstring retval; auto targetsize = multibytetowidechar(cp_utf8, 0, input.c_str(), static_cast<int>(input.size()), null, 0); retval.resize(targetsize); auto res = multibytetowidechar(cp_utf8, 0, input.c_str(), static_cast<int>(input.size()), const_cast<lpwstr>(retval.data()), targetsize); if(res == 0) { // handle error, throw, something... } return retval; } int main() { std::string input = "lorem ipsum dolor sit amet, consectetur adipiscing elit, sed eiusmod tempor incididunt ut " "labore et dolore magna aliqua. ut enim ad minim veniam, quis nostrud exercitation ullamco " "laboris nisi ut aliquip ex ea commodo consequat. duis aute irure dolor in reprehenderit in " "voluptate velit esse cillum dolore eu fugiat nulla pariatur. excepteur sint occaecat " "cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."; { auto start = std::chrono::high_resolution_clock::now(); for(int = 0; < 100'000; ++i) { auto result = foo(input); } auto end = std::chrono::high_resolution_clock::now(); auto res = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count(); std::cout << "elapsed time: " << res << std::endl; } { auto start = std::chrono::high_resolution_clock::now(); for(int = 0; < 100'000; ++i) { auto result = baz(input); } auto end = std::chrono::high_resolution_clock::now(); auto res = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count(); std::cout << "elapsed time: " << res << std::endl; } return 0; }
results when compiled , ran release x64
elapsed time: 3065 elapsed time: 29
two orders of magnitude...
Comments
Post a Comment