pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/ww898/utf-cpp

" /> GitHub - ww898/utf-cpp: UTF-8/16/32 C++11 header only library for Windows / Linux / macOS
Skip to content
/ utf-cpp Public

UTF-8/16/32 C++11 header only library for Windows / Linux / macOS

License

Notifications You must be signed in to change notification settings

ww898/utf-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UTF-8/16/32 C++ library

This is the C++11 template based header only library under Windows/Linux/MacOs to convert UFT-8/16/32 symbols and strings. The library transparently support wchar_t as UTF-16 for Windows and UTF-32 for Linux and MacOs.

UTF-8 and UTF-32 (UCS-32) both support 31 bit wide code points [0‥0x7FFFFFFF]with no restriction. UTF-16 supports only unicode code points [0‥0x10FFFF], where high [0xD800‥0xDBFF] and low [0xDC00‥0xDFFF] surrogate regions are prohibited.

The maximum UTF-16 symbol size is 2 words (4 bytes, both words should be in the surrogate region). UFT-32 (UCS-32) is always 1 word (4 bytes). UTF-8 has the maximum symbol size (see conversion table for details):

  • 4 bytes for unicode code points
  • 6 bytes for 31bit code points
UTF-16 surrogate decoder:
High\Low DC00 DC01 DFFF
D800 010000 010001 0103FF
D801 010400 010401 0107FF
DBFF 10FC00 10FC01 10FFFF

UTF-16 Surrogates

Supported compilers

Tested on following compilers:

Usage example

    // यूनिकोड
    static char const u8s[] = "\xE0\xA4\xAF\xE0\xA5\x82\xE0\xA4\xA8\xE0\xA4\xBF\xE0\xA4\x95\xE0\xA5\x8B\xE0\xA4\xA1";
    using namespace ww898::utf;
    std::u16string u16;
    convz<utf_selector_t<decltype(*u8s)>, utf16>(u8s, std::back_inserter(u16));
    std::u32string u32;
    conv<utf16, utf_selector_t<decltype(u32)::value_type>>(u16.begin(), u16.end(), std::back_inserter(u32));
    std::vector<char> u8;
    convz<utf32, utf8>(u32.data(), std::back_inserter(u8));
    std::wstring uw;
    conv<utf8, utfw>(u8s, u8s + sizeof(u8s), std::back_inserter(uw));
    auto u8r = conv<char>(uw);
    auto u16r = conv<char16_t>(u16);
    auto uwr = convz<wchar_t>(u8s);

    auto u32r = conv<char32_t>(std::string_view(u8r.data(), u8r.size())); // C++17 only

    static_assert(std::is_same<utf_selector<decltype(*u8s)>, utf_selector<decltype(u8)::value_type>>::value, "Fail");
    static_assert(
        std::is_same<utf_selector_t<decltype(u16)::value_type>, utf_selector_t<decltype(uw)::value_type>>::value !=
        std::is_same<utf_selector_t<decltype(u32)::value_type>, utf_selector_t<decltype(uw)::value_type>>::value, "Fail");

UTF-8 Conversion table

UTF-8/32 table

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy