场景:我想读取采用 utf-8 编码的阿拉伯数据集。每行中的每个单词都用空格分隔。
<小时/>问题:当我阅读每一行时,输出为:
<小时/>??????? ?? ???? ?? ???
问题:如何读取文件并打印每一行? 欲了解更多信息,here是我的阿拉伯数据集,读取数据的部分源代码如下所示:
private ContextCountsImpl extractContextCounts(Map<Integer, String> phraseMap) throws IOException {
Reader reader;
reader = new InputStreamReader(new FileInputStream(inputFile), "utf-8");
BufferedReader rdr = new BufferedReader(reader);
while (rdr.ready()) {
String line = rdr.readLine();
System.out.println(line);
List<String> phrases = splitLineInPhrases(line);
//any process on this file
}
}
最佳答案
我可以使用UTF-8
阅读,你可以这样尝试吗?
public class ReadArabic {
public static void main(String[] args) {
try {
String line;
InputStream fileInputStream = new FileInputStream("arabic.txt");
Reader reader = new InputStreamReader(fileInputStream, "UTF-8"); // leave charset out for default
BufferedReader bufferedReader = new BufferedReader(reader);
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
System.err.println(e.getMessage()); // handle all exceptions
}
}
}
关于java - 如何在java中正确读取阿拉伯数据集?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56684167/