我有 2 个文本文件:
文件1 - 此文件的格式为user_id tweet_id tweet_text
文件 1
60730027 6298443824 thank you echo park. you've changed A LOT, but as long as I'm getting paid to make you move, I'm still with it! 2009-12-03 02:54:10
60730027 6297282530 fat Albert Einstein goin in right now over here!!! 2009-12-03 01:35:22
文件2
该文件的格式为genome_id name ascii_name
4045417 Southwest Indent Southwest Indent
4045418 Southeast Point Southeast Point
下面是读取文件1的代码片段:
public void readfromFile() throws FileNotFoundException {
Scanner inputStream;
String source=null;
FileInputStream file = new FileInputStream("file1.txt");
String regex = "/[a-zA-Z ]+/";
Scanner fileScan = new Scanner(file);
while(fileScan.hasNextLine()){
word = fileScan.nextLine();
word = word.replaceAll(regex, "").toLowerCase();
PrintWriter outputStreamName = new PrintWriter(new FileOutputStream("temp.txt"));
outputStreamName.printf("%s",word);
}
我的目的首先是用空值替换 user_id、tweet_id、genome_id 中存在的数据。然后将大写值转换为小写值。但是,现在每当此代码处理 file1 时,文本文件都不会发生任何变化。我也想知道发生了什么事。当我将其输出到控制台时,我得到了输出。
预期输出:
thank you echo park youve changed a lot but as long as im getting paid to make you move im still with it
fat albert einstein goin in right now over here
最佳答案
根据预期输出,您想要替换除字母、点和单词之间的空格之外的所有内容。
[^a-zA-Z. ]+|(?<=\d)\s*(?=\d)|(?<=\D)\s*(?=\d)|(?<=\d)\s*(?=\D)
这里是online demo
或者尝试不使用 Lookaround
[^a-zA-Z. ]+|\d\s+\d|\D\s+\d|\d\s+\D
此处 \s
匹配任何空白字符 [\r\n\t\f ]
示例代码:
String regex = "[^a-zA-Z. ]+|(?<=\\d)\\s*(?=\\d)|(?<=\\D)\\s*(?=\\d)|(?<=\\d)\\s*(?=\\D)";
str.replaceAll(regex,"");
输出:
thank you echo park. youve changed A LOT but as long as Im getting paid to make you move Im still with it
fat Albert Einstein goin in right now over here
<小时/>
要从输出中排除 '
,请使用 [^a-zA-Z.'。 ]+
否则 Im
和 youve
更改为 Im
和 youve
。
更好使用[a-zA-Z']+
仅获取所有单词。这是demo
示例代码:
String str = "60730027 6297282530 fat Albert Einstein goin in right now over here!!! 2009-12-03 01:35:22 ";
Pattern p = Pattern.compile("[a-zA-Z']+");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.print(m.group()+" ");
}
输出:
fat Albert Einstein goin in right now over here
<小时/>
注意:您正在检查下一行,因此
更改:
source = inputStream.next();
致:
source = inputStream.nextLine();
关于java - 使用从文本文件中删除所有数字和字母数字字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25337621/