所以我有一长串这样的词,基于第一个空格,我想将这些词拆分为词义。基本上,我为此使用 Apache POI
,因为我必须读取 docx 文件,然后从中获取数据。
abash humiliate, embarrass
abdicate relinquish power or position
aberrant abnormal
abet aid, encourage (typically of crime)
abeyance postponement
aboriginal indigenous
abridge shorten
abstemious moderate
...
那么什么正则表达式适合我的目的,以便我可以像这样显示它:
word :abash
meaning : humiliate, embarrass
...
我的代码是:
public class WordFileReader {
/**
* @param args
*/
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\important.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
--编辑-- 根据建议的答案,我正在使用这个
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("E:\\Words.docx");
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor = new XWPFWordExtractor(new XWPFDocument(fis));
//System.out.print(oleTextExtractor.getText());
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
int i = line.indexOf(' ');
String word = line.substring(0, i);
String meaning = line.substring(i).trim();
System.out.println("word "+word);
System.out.println("meaning "+meaning);
}
} catch (Exception e) {
e.printStackTrace();
}
}
但是我明白了
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at WordFileReader.main(WordFileReader.java:25)
最佳答案
我会使用 java.util.Scanner 从文本中提取行
Scanner sc = new Scanner(oleTextExtractor.getText());
while(sc.hasNextLine()) {
String line = sc.nextLine();
...
然后我会把这行分成单词和意思
int i = line.indexOf(' ', 2); // start from pos 2 to avoid a article
String word = txt.substring(0, i);
String meaning = txt.substring(i).trim();
或
String[] parts = line.split("(?<!^a)\\s+", 2);
String word = parts[0];
String meaning = parts[1];
关于java - 将句子拆分为两个字符串并迭代显示,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17019776/