c++ - UTF8 data to std::string or std::wstring -

i receive body bytes http server response , dont know how convert them utf8 string work them.

i have idea not sure wheter works. need bytes of response , search on them , modify them, need transform std::vector<byte> std::wstring or std::string.

the bytes encoding in utf8 of response in std::vector<byte>, how can transform them std::string? shall transform them std::wstring?.

i found code:

std::string encoding::stringtoutf8(const std::string& str) { int size = multibytetowidechar(cp_acp, mb_composite, str.c_str(), str.length(), null, 0);  std::wstring utf16_str(size, '\0');  multibytetowidechar(cp_acp, mb_composite, str.c_str(), str.length(), &utf16_str[0], size);  int utf8_size = widechartomultibyte(cp_utf8, 0, utf16_str.c_str(), utf16_str.length(), null, 0, null, null);  std::string utf8_str(utf8_size, '\0');  widechartomultibyte(cp_utf8, 0, utf16_str.c_str(), utf16_str.length(), &utf8_str[0], utf8_size, null, null);  return utf8_str;

}

but if want search character "Ñ" in string work?, or have transform bytes in std::wstring , search "Ñ" modify std::wstring , convert std::string?

which of 2 correct?

i need put utf8 response in std::string or std::wstring in order search , modify data (with special characters) , resend response client in utf8.

storing utf-8 in std::string no more storing sequence of bytes in "vector". std::string not aware of encoding stuff whatsoever, , member function find or <algorithm> function std::find not work once need work beyond standard ascii. how gonna handle situation, can try , convert input (l"Ñ") utf-8 sequence , try find in std::string or can convert string wstring , work directly on it. imho, in case when have manipulate (search, extract words, split letters or replace, , beyond ascii range) input better stick wstring , before posting client convert utf-8 std::string
edit001: of std::codecvt_utf8 mentioned above in comment , comment performance concerns. here test

std::wstring foo(const std::string& input) {     std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;     return converter.from_bytes(input.c_str()); }  std::wstring baz(const std::string& input) {     std::wstring retval;     auto targetsize = multibytetowidechar(cp_utf8, 0, input.c_str(), static_cast<int>(input.size()), null, 0);     retval.resize(targetsize);     auto res = multibytetowidechar(cp_utf8, 0, input.c_str(), static_cast<int>(input.size()),                                    const_cast<lpwstr>(retval.data()), targetsize);     if(res == 0)     {         // handle error, throw, something...     }     return retval; }  int main() {     std::string input = "lorem ipsum dolor sit amet, consectetur adipiscing elit, sed eiusmod tempor incididunt ut "                         "labore et dolore magna aliqua. ut enim ad minim veniam, quis nostrud exercitation ullamco "                         "laboris nisi ut aliquip ex ea commodo consequat. duis aute irure dolor in reprehenderit in "                         "voluptate velit esse cillum dolore eu fugiat nulla pariatur. excepteur sint occaecat "                         "cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";      {         auto start = std::chrono::high_resolution_clock::now();         for(int = 0; < 100'000; ++i)         {             auto result = foo(input);         }         auto end = std::chrono::high_resolution_clock::now();         auto res = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();         std::cout << "elapsed time: " << res << std::endl;     }      {         auto start = std::chrono::high_resolution_clock::now();         for(int = 0; < 100'000; ++i)         {             auto result = baz(input);         }         auto end = std::chrono::high_resolution_clock::now();         auto res = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();         std::cout << "elapsed time: " << res << std::endl;     }     return 0; }

results when compiled , ran release x64
elapsed time: 3065 elapsed time: 29

two orders of magnitude...

Search This Blog

Breniser

c++ - UTF8 data to std::string or std::wstring -

Comments

Post a Comment

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -