我用 java 编写了一个解析器,用于解析文本文件中的多个特征。其想法是获取与相应标题相对应的行 block 。
例如,如果我有这个:
CC -!- FUNCTION: Adapter protein implicated in the regulation of a large
CC spectrum of both general and specialized signaling pathways. Binds ...
我需要得到这个:
Function : Adapter protein implicated in the regulation of a large spectrum of both general and specialized signaling pathways. Binds ....
对于该类型文本文件的所有功能,我可以毫无问题地做到这一点。
当我遇到这个问题时,问题就出现了:
CC -!- FUNCTION: Adapter protein implicated in the regulation of a large
CC spectrum of both general and specialized signaling pathway ...
CC -!- SUBUNIT: Homodimer. Interacts with SAMSN1 and PRKCE (By
CC similarity). Interacts with SSH1 and TORC2/CRTC2. Interacts ..
当我完成“function” block 时,我的解析器总是会在末尾跳一行并转义,因此我无法再得到带有“SUBUNIT”的行:(
这是我需要解析的文件的示例:
CC -!- FUNCTION: Adapter protein implicated in the regulation of a large
CC spectrum of both general and specialized signaling pathways. Binds...
CC -!- SUBUNIT: Homodimer. Interacts with SAMSN1 and PRKCE (By
CC similarity). Interacts with SSH1 and TORC2/CRTC2. Interacts with ...
CC -!- SUBUNIT: Homodimer. Interacts with SAMSN1 and PRKCE salut(By
CC similarity). Interacts with SSH1 and TORC2/CRTC2. salutInteracts with
CC -!- INTERACTION:
CC Q76353:- (xeno); NbExp=3; IntAct=EBI-359815, EBI-6248077;
CC Q9P0K1-3:ADAM22; NbExp=2; IntAct=EBI-359815, EBI-1567267; ...
CC -!- SUBCELLULAR LOCATION: Cytoplasm. Melanosome. Note=Identified by
CC mass spectrometry in melanosome fractions from stage I to stage
CC IV. ....
这是我写的一部分。我试图在读取文件时标记文件中的当前位置,但是当我这样做时解析效果不佳。我在这里缺少什么?
为任何帮助干杯,我们将不胜感激:)
// Function
if (line.startsWith("CC -!- FUNCTION")) {
String data[] = line.split("CC -!- FUNCTION:");
function = function + data[1];
while ((line = bReader.readLine()) != null && ( (line.startsWith("CC ")) || (line.startsWith("CC -!- FUNCTION")) ) ) {
if (line.startsWith("CC ")) {
String dataOther[] = line.split("CC ");
function = function + dataOther[1];
prot.setFunction(function);
bReader.mark(size);
}
else if (line.startsWith("CC -!- FUNCTION")) {
String dataOther[] = line.split("CC -!- FUNCTION:");
function = function + "-!-"+ dataOther[1];
prot.setFunction(function);
bReader.mark(size);
}
}
bReader.reset();
}
// Subunit
if (line.startsWith("CC -!- SUBUNIT")) {
String data[] = line.split("CC -!- SUBUNIT:");
subunit = subunit + "-|-"+ data[1];
while ((line = bReader.readLine()) != null && ( (line.startsWith("CC ")) ) ) {
if (line.startsWith("CC ")) {
String dataOther[] = line.split("CC ");
subunit = subunit + dataOther[1];
prot.setSubunit(subunit);
}
}
//bReader.reset();
}
最佳答案
.mark() 和 .reset() 用于从缓冲区读取的更高级技术。我认为在你的情况下,你只需要重新学习从文件中读取数据。我在你的代码中看到你有多个 bReader.readLine();将从缓冲区读取一行并每次丢弃它,因此通常您只想 .readLine 一次,然后处理它。
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
if (line.startsWith("CC -!- FUNCTION")) {
String line2 = br.readLine();
//do some stuff
}
if (line.startsWith("CC -!- SUBUNIT")) {
String line2 = br.readLine();
//do some stuff
}
}
br.close();
我对你的理解正确吗?
关于java - 如何在java中读取文件时标记位置?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22514695/