c++ - 如何解析UTF-8中文字符串

我正在尝试解析可能包含中文字符的 std::string。例如对于一个字符串包含

哈囉hi你好hello

我想把它们分成6个字符串:哈, 啰, hi, 你, 好，你好。现在，该字符串是通过使用文本文件中的 getline() 获取的。引用这篇文章How to use boost::spirit to parse UTF-8? ，这是我当前的代码:

#include <boost/regex/pending/unicode_iterator.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/range.hpp>
#include <iterator>
#include <iostream>
#include <ostream>
#include <cstdint>
#include <string>

using namespace boost;
using namespace std;
using namespace std::string_literals; 

int main()
{
    string str = u8"哈囉hi你好hello"; //actually got from getline()
    auto &&utf8_text = str;

    u8_to_u32_iterator<const char*>
        tbegin(begin(utf8_text)), tend(end(utf8_text));

    vector<uint32_t> result;
    spirit::qi::parse(tbegin, tend, *spirit::standard_wide::char_, result);
    for(auto &&code_point : result) {
        cout << code_point << ";";
    }
}

但出现错误:对“开始”和“结束”的调用不明确。当我直接声明auto &&utf8_text = u8"哈啰hi你好你好"时可以，但是我不能这样写，因为字符串的内容是由getline()决定的>.

我也试过这个:

auto str = u8"你好，世界！";
auto &&utf8_text = str;

但仍然出现错误:没有匹配函数来调用“开始”和“结束”。

最佳答案

auto 与字符串字面量导致 char 指针。如果你想要std::string，你必须把它写出来。

关于c++ - 如何解析UTF-8中文字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41084664/

上一篇：C++ GTKmm - 将文本插入到具有多个缓冲区和更改文本的 TextView

下一篇：c++ - LZW压缩生成文件比原来大

iphone - 在 iPhone 中使用 UTF-8 进行编码和解码

c++ - 如何修复 C++ 错误 : expected unqualified-id

c++ - 编译错误 'set_ssl_context_callback' : on Windows

c++ - 捕获 boost 序列化存档异常

java - UTF-8 编码不适用于 Eclipse Java 应用程序

c++ - 将 char 与 const char * 进行比较

c++ - 当从函数调用传递值时，std::tie 失败并返回 "cannot bind non-const lvalue reference"

c++ - 如果输入流无效会发生什么

c++ - 使用 boost::move 将数据移动到 std::make_pair