python - 无法使用 python : incomplete parsing of text file 解析文本 block

标签 python python-3.x file parsing replace

我是一名化学家,对编程还很陌生。我尝试编写程序,使我在处理数据时的生活更轻松。在搜索了一整天的 StackOverflow 之后,我终于能够编写一个简短的 Python 脚本来解析一个文本文件,该文本文件包含由空行分隔的相似数据 block 。我的代码运行良好,但它不解析最后一个 block 。我不确定为什么。我尝试寻找答案,但找不到有用的答案。

在一个典型的文本文件中,有 361 个数据 block ,每个数据 block 包含在 3-D 空间中构造一个分子的信息,该分子具有一组四个原子的不同扭转角。这是我尝试解析的文本文件示例,其中仅包含前两个 block 。

!Coordinate: -51.45857  Energy: *****
6 0.006074 0.000915 0.000760
6 0.003070 -0.004811 1.496641
6 1.065644 -0.015789 2.367841
6 2.500078 -0.010542 1.993114
6 3.043633 -0.885454 1.109936
6 2.319723 -2.061360 0.571949
6 1.651211 -3.009615 1.308815
16 0.964940 -4.223294 0.280714
6 1.598121 -3.476004 -1.156548
6 2.300403 -2.353600 -0.830192
1 2.774538 -1.713316 -1.566133
6 1.370973 -4.039010 -2.492108
6 2.306097 -3.847669 -3.514857
6 2.051238 -4.378854 -4.772466
7 0.959825 -5.084236 -5.080872
6 0.075629 -5.271691 -4.098835
6 0.226680 -4.776825 -2.808825
1 -0.547454 -4.952070 -2.067650
1 -0.811208 -5.846075 -4.358490
1 2.771093 -4.237936 -5.576037
1 3.231185 -3.312215 -3.327250
6 1.484740 -3.110171 2.791981
1 2.271126 -2.537323 3.291578
1 0.521994 -2.699519 3.116631
1 1.545489 -4.149268 3.130100
6 4.425208 -0.728995 0.567929
6 5.293981 -1.825349 0.536092
6 6.575924 -1.699782 0.012540
6 7.002467 -0.480078 -0.506308
6 6.138453 0.611969 -0.498798
6 4.860426 0.488085 0.033453
1 4.189564 1.341843 0.040929
1 6.459401 1.563510 -0.912065
1 8.000697 -0.382509 -0.922563
1 7.242127 -2.557541 0.005802
1 4.957135 -2.781274 0.928240
6 3.298894 1.044689 2.682189
6 2.806965 2.352662 2.756428
6 3.525634 3.346796 3.410575
6 4.740700 3.044040 4.018965
6 5.230033 1.741208 3.969123
6 4.514468 0.749369 3.308424
1 4.901734 -0.264238 3.270300
1 6.171693 1.494548 4.450468
1 5.300110 3.817950 4.536063
1 3.131670 4.358007 3.451132
1 1.851909 2.586965 2.294231
6 0.644628 0.032167 3.735978
6 -0.708788 0.041750 3.903716
16 -1.501825 0.018225 2.355367
6 -1.460523 0.074589 5.163238
6 -0.916630 -0.463354 6.334489
6 -1.645855 -0.393694 7.514376
7 -2.861426 0.150339 7.612820
6 -3.380262 0.652483 6.490232
6 -2.733763 0.643955 5.260195
1 -3.211536 1.093615 4.394957
1 -4.369681 1.095963 6.579511
1 -1.232419 -0.806908 8.432018
1 0.055022 -0.946356 6.323493
1 1.348290 0.078304 4.560069
1 -0.126732 -1.007882 -0.406234
1 -0.790297 0.637669 -0.396423
1 0.964526 0.378020 -0.366958

!Coordinate: -52.45859  Energy: *****
6 0.016006 0.016117 -0.001167
6 0.008091 0.004202 1.494640
6 1.068924 -0.017801 2.367520
6 2.503392 -0.009246 1.992562
6 3.048080 -0.887580 1.113704
6 2.322345 -2.062968 0.576734
6 1.653555 -3.010561 1.314091
16 0.963790 -4.222595 0.286393
6 1.595670 -3.475347 -1.151441
6 2.300257 -2.354228 -0.825550
1 2.774156 -1.714212 -1.561877
6 1.365619 -4.037046 -2.487061
6 2.299829 -3.846714 -3.510831
6 2.042363 -4.376373 -4.768547
7 0.949180 -5.079357 -5.076134
6 0.065835 -5.265841 -4.093142
6 0.219443 -4.772314 -2.802916
1 -0.554143 -4.946542 -2.060928
1 -0.822495 -5.838195 -4.352173
1 2.761473 -4.236208 -5.572914
1 3.226175 -3.313192 -3.323941
6 1.489754 -3.111703 2.797517
1 2.276398 -2.538063 3.295797
1 0.527124 -2.702199 3.123917
1 1.552391 -4.150812 3.135284
6 4.429609 -0.733119 0.571119
6 5.297405 -1.830292 0.541209
6 6.579288 -1.706863 0.016976
6 7.006698 -0.488561 -0.504432
6 6.143617 0.604255 -0.498839
6 4.865654 0.482526 0.034036
1 4.195506 1.336862 0.039937
1 6.465258 1.554673 -0.914138
1 8.004858 -0.392683 -0.921241
1 7.244728 -2.565225 0.011647
1 4.959792 -2.785154 0.935276
6 3.299443 1.049518 2.679214
6 2.802410 2.355625 2.752465
6 3.517994 3.353520 3.404255
6 4.735140 3.056448 4.011255
6 5.229631 1.755519 3.962371
6 4.517166 0.759897 3.304042
1 4.908495 -0.252167 3.266846
1 6.172959 1.513307 4.442708
1 5.292160 3.833294 4.526530
1 3.119965 4.363158 3.444095
1 1.845729 2.585480 2.291417
6 0.646126 0.024358 3.735114
6 -0.707598 0.038379 3.900610
16 -1.498235 0.027194 2.350780
6 -1.461222 0.067452 5.159060
6 -0.920246 -0.476558 6.328874
6 -1.650941 -0.410067 7.508021
7 -2.865451 0.136232 7.607012
6 -3.381624 0.644001 6.485729
6 -2.733382 0.639158 5.256592
1 -3.209059 1.093310 4.392548
1 -4.370246 1.089176 6.575392
1 -1.239629 -0.827839 8.424551
1 0.050370 -0.961638 6.317360
1 1.348629 0.063341 4.560569
1 -0.118307 -0.990869 -0.412258
1 -0.776594 0.657152 -0.398933
1 0.977453 0.391011 -0.363519

每个 block 包含以下信息:

  1. 包含扭转角的标题行。
  2. 标题行之后的每一行包含 4 列:原子序数、x、y、z

我需要对每个 block 执行以下操作:

  1. 提取扭转角。提取扭转角后删除线。
  2. 将每个原子序数更改为相应的元素。
  3. 写一个单独的 *.xyz 文件,在顶部有元素而不是原子序数和原子数。

这是我的代码示例:

import os
import re

#I just paste the file path for now. And change \ to \\ 
filepath = os.path.normpath("file.xyz") 

#Dictionary for atomic number and element
replacements = {'1': 'H', '6': 'C', '7': 'N', '16':'S'} 

#Open read and write files
originalFile = open(filepath, 'r') 
writeEditedFile = open('output_all(edited).txt', 'w')
readEditedFile = open('output_all(edited).txt', 'r')

#Replace atomic numbers with element symbol
for lines in originalFile:
    writeEditedFile.write(re.sub('(^\d+)', lambda m: replacements[m.group()], lines)) 

#Extract torsion angle and append to array
with open('output_all(edited).txt', 'r') as wEF: 
    torsionAngles = []
    for line in wEF:
        if '!' in line:
            for number in line.split():
                try:
                    torsionAngles.append(str(float(number)))
                except ValueError:
                    pass

#Write each line into a new file until a blank line
#The file is closed and a new one is opened
#This should continue until the last block
with readEditedFile as rEF:
    record = 0
    separateFile = open('Step_' + str(record+1) + '_TorsionAngle_' + torsionAngles[record] + '.xyz', 'w')
    separateFile.write('64 \n \n')
    for lines in rEF:
        if lines == "\n":
            record += 1
            separateFile.close()
            separateFile = open('Step_'+ str(record+1) + '_TorsionAngle_' + torsionAngles[record] + '.xyz', 'w')
            separateFile.write('64 \n \n')
        else:
            if '!' in lines:
                lines = ''
            else:
                separateFile.write(lines)

对不起,草率的代码!这是它输出的前两个文件的示例:

文件名:Step_1_TorsionAngle_-51.45857.xyz

64 

C 0.006074 0.000915 0.000760
C 0.003070 -0.004811 1.496641
C 1.065644 -0.015789 2.367841
C 2.500078 -0.010542 1.993114
C 3.043633 -0.885454 1.109936
C 2.319723 -2.061360 0.571949
C 1.651211 -3.009615 1.308815
S 0.964940 -4.223294 0.280714
C 1.598121 -3.476004 -1.156548
C 2.300403 -2.353600 -0.830192
H 2.774538 -1.713316 -1.566133
C 1.370973 -4.039010 -2.492108
C 2.306097 -3.847669 -3.514857
C 2.051238 -4.378854 -4.772466
N 0.959825 -5.084236 -5.080872
C 0.075629 -5.271691 -4.098835
C 0.226680 -4.776825 -2.808825
H -0.547454 -4.952070 -2.067650
H -0.811208 -5.846075 -4.358490
H 2.771093 -4.237936 -5.576037
H 3.231185 -3.312215 -3.327250
C 1.484740 -3.110171 2.791981
H 2.271126 -2.537323 3.291578
H 0.521994 -2.699519 3.116631
H 1.545489 -4.149268 3.130100
C 4.425208 -0.728995 0.567929
C 5.293981 -1.825349 0.536092
C 6.575924 -1.699782 0.012540
C 7.002467 -0.480078 -0.506308
C 6.138453 0.611969 -0.498798
C 4.860426 0.488085 0.033453
H 4.189564 1.341843 0.040929
H 6.459401 1.563510 -0.912065
H 8.000697 -0.382509 -0.922563
H 7.242127 -2.557541 0.005802
H 4.957135 -2.781274 0.928240
C 3.298894 1.044689 2.682189
C 2.806965 2.352662 2.756428
C 3.525634 3.346796 3.410575
C 4.740700 3.044040 4.018965
C 5.230033 1.741208 3.969123
C 4.514468 0.749369 3.308424
H 4.901734 -0.264238 3.270300
H 6.171693 1.494548 4.450468
H 5.300110 3.817950 4.536063
H 3.131670 4.358007 3.451132
H 1.851909 2.586965 2.294231
C 0.644628 0.032167 3.735978
C -0.708788 0.041750 3.903716
S -1.501825 0.018225 2.355367
C -1.460523 0.074589 5.163238
C -0.916630 -0.463354 6.334489
C -1.645855 -0.393694 7.514376
N -2.861426 0.150339 7.612820
C -3.380262 0.652483 6.490232
C -2.733763 0.643955 5.260195
H -3.211536 1.093615 4.394957
H -4.369681 1.095963 6.579511
H -1.232419 -0.806908 8.432018
H 0.055022 -0.946356 6.323493
H 1.348290 0.078304 4.560069
H -0.126732 -1.007882 -0.406234
H -0.790297 0.637669 -0.396423
H 0.964526 0.378020 -0.366958

文件名:Step_2_TorsionAngle_-52.45859.xyz

64 

C 0.016006 0.016117 -0.001167
C 0.008091 0.004202 1.494640
C 1.068924 -0.017801 2.367520
C 2.503392 -0.009246 1.992562
C 3.048080 -0.887580 1.113704
C 2.322345 -2.062968 0.576734
C 1.653555 -3.010561 1.314091
S 0.963790 -4.222595 0.286393
C 1.595670 -3.475347 -1.151441
C 2.300257 -2.354228 -0.825550
H 2.774156 -1.714212 -1.561877
C 1.365619 -4.037046 -2.487061
C 2.299829 -3.846714 -3.510831
C 2.042363 -4.376373 -4.768547
N 0.949180 -5.079357 -5.076134
C 0.065835 -5.265841 -4.093142
C 0.219443 -4.772314 -2.802916
H -0.554143 -4.946542 -2.060928
H -0.822495 -5.838195 -4.352173
H 2.761473 -4.236208 -5.572914
H 3.226175 -3.313192 -3.323941
C 1.489754 -3.111703 2.797517
H 2.276398 -2.538063 3.295797
H 0.527124 -2.702199 3.123917
H 1.552391 -4.150812 3.135284
C 4.429609 -0.733119 0.571119
C 5.297405 -1.830292 0.541209
C 6.579288 -1.706863 0.016976
C 7.006698 -0.488561 -0.504432
C 6.143617 0.604255 -0.498839
C 4.865654 0.482526 0.034036
H 4.195506 1.336862 0.039937
H 6.465258 1.554673 -0.914138
H 8.004858 -0.392683 -0.921241
H 7.244728 -2.565225 0.011647
H 4.959792 -2.785154 0.935276
C 3.299443 1.049518 2.679214
C 2.802410 2.355625 2.752465
C 3.517994 3.353520 3.404255
C 4.735140 3.056448 4.011255
C 5.229631 1.755519 3.962371
C 4.517166 0.759897 3.304042
H 4.908495 -0.252167 3.266846
H 6.172959 1.513307 4.442708
H 5.292160 3.833294 4.526530
H 3.119965 4.363158 3.444095
H 1.845729 2.585480 2.291417
C 0.646126 0.024358 3.735114
C -0.707598 0.038379 3.900610
S -1.498235 0.027194 2.350780
C -1.461222 0.067452 5.159060
C -0.920246 -0.476558 6.328874
C -1.650941 -0.410067 7.508021
N -2.865451 0.136232 7.607012
C -3.381624 0.644001 6.485729
C -2.733382 0.639158 5.256592
H -3.209059 1.093310 4.392548
H -4.370246 1.089176 6.575392
H -1.239629 -0.827839 8.424551
H 0.050370 -0.961638 6.317360
H 1.348629 0.063341 4.560569
H -0.118307 -0.990869 -0.412258
H -0.776594 0.657152 -0.398933
H 0.977453 0.391011 -0.363519

除了最后一个之外,这个简单的代码对每个 block 都做了我想要它做的事情!任何建议或提示将不胜感激!感谢阅读我的帖子!

最佳答案

此脚本将从 file.txt 中读取示例输入数据(如问题中所写)并写入两个文件 Step_1_TorsionAngle_-51.45857.xyzStep_2_TorsionAngle_- 52.45859.xyz:

import re

replacements = {'1': 'H', '6': 'C', '7': 'N', '16':'S'}

with open('file.txt', 'r') as f_in:
    data = f_in.read()

torsion_angles = re.findall(r'!Coordinate:\s+(.*?)\s+Energy', data)
blocks = [b.splitlines() for b in re.findall(r'^(\d.*?)(?=\s*!|\Z)', data, flags=re.DOTALL|re.M)]

for step, (angle, block) in enumerate(zip(torsion_angles, blocks), 1):
    with open('Step_{}_TorsionAngle_{}.xyz'.format(step, angle), 'w') as f_out:
        f_out.write(str(len(block)) + '\n\n')
        lines = [' '.join([replacements[s[0]], *s[1:]]) for s in [v.split() for v in block]]
        f_out.write('\n'.join(lines))

文件内容是这样的:

64

C 0.006074 0.000915 0.000760
C 0.003070 -0.004811 1.496641
C 1.065644 -0.015789 2.367841
C 2.500078 -0.010542 1.993114

...etc.

关于python - 无法使用 python : incomplete parsing of text file 解析文本 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57194998/

相关文章:

字符 vector 节省超过其大小

python - 从 jupyter notebook 生成 pdf 时如何获得相同的表格外观?

python - 为什么字符串的startswith比in慢?

python - 使用套接字 API 编写基本的 HTTP 服务器

perl - 如何在 Perl 中一次处理两行文件?

java - 如何获取给定目录中包含具有特定名称的文件的所有目录

python - 在 python 中超出索引

python - 将 sklearn.metrics Jaccard Index 与图像一起使用?

python - findall 不返回 Python 3.7 中的所有结果

python - Python 中高效的组子字符串搜索?