python - 在python中使用正则表达式提取科学记数法数字

标签 python qregexp

我正在尝试从 python 中的大量文本文件中提取值。我需要的数字采用科学计数法形式。我的结果文本文件如下

    ADDITIONAL DATA

Tip Rotation (degrees)
Node    , UR[x] , UR[y] , UR[z]
21  , 1.0744    , 1.2389    , -4.3271
22  , -1.0744   , -1.2389   , -4.3271
53  , 0.9670    , 1.0307    , -3.8990
54  , -0.0000   , -0.0000   , -3.5232
55  , -0.9670   , -1.0307   , -3.8990

Mean rotation variation along blade
Region  , Rotation (degrees)
Partition line 0,   7.499739E-36
Partition line 1,   -3.430092E-01
Partition line 2,   -1.019287E+00
Partition line 3,   -1.499808E+00
Partition line 4,   -1.817651E+00
Partition line 5,   -2.136372E+00
Partition line 6,   -2.448321E+00
Partition line 7,   -2.674414E+00
Partition line 8,   -2.956737E+00
Partition line 9,   -3.457806E+00
Partition line 10,  -3.995106E+00

我过去一直成功地使用正则表达式,但它似乎不想获取数字。我的结果文件中的节点数量发生变化,因此无法按行搜索。我的Python脚本如下。

import re
from pylab import *
from scipy import *
import matplotlib
from numpy import *
import numpy as np
from matplotlib import pyplot as plt
import csv

########################################

minTheta        = -90
maxTheta        = 0
thetaIncrements = 10

numberOfPartitions = 10

########################################

numberOfThetas = ((maxTheta - minTheta)/thetaIncrements)+1
print 'Number of thetas = '+str(numberOfThetas)

thetas              = linspace(minTheta,maxTheta,numberOfThetas)
print 'Thetas = '+str(thetas)
part                = linspace(1,numberOfPartitions,numberOfPartitions)
print 'Parts = '+str(part)

meanRotations       = np.zeros((numberOfPartitions+1,numberOfThetas))
#print meanRotations

theta = minTheta
n=0
m=0
while theta <= maxTheta:

        fileName = str(theta)+'.0.txt'
        #print fileName

        regexp = re.compile(r'Partition line 0, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[0,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 1, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[1,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 2, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[2,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 3, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[3,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 4, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[4,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 5, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[5,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 6, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[6,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 7, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[7,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 8, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[8,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 9, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[9,m]=(float((match.group(1))))

        regexp = re.compile(r'Partition line 10, .*?([-+0-9.E]+)')
        with open(fileName) as f:
            for line in f:
                match = regexp.match(line)
                if match:
                    print (float((match.group(1))))
                    meanRotations[10,m]=(float((match.group(1))))

        m=m+1
        theta = theta+thetaIncrements

print 'Mean rotations on partition lines = '
print meanRotations

任何帮助将不胜感激!!

最佳答案

这种文件格式是标准格式吗?如果是这样?您可以使用另一种技术获得所有浮点值。 所以,代码如下:

str = """    ADDITIONAL DATA

Tip Rotation (degrees)
Node    , UR[x] , UR[y] , UR[z]
21  , 1.0744    , 1.2389    , -4.3271
22  , -1.0744   , -1.2389   , -4.3271
53  , 0.9670    , 1.0307    , -3.8990
54  , -0.0000   , -0.0000   , -3.5232
55  , -0.9670   , -1.0307   , -3.8990

Mean rotation variation along blade
Region  , Rotation (degrees)
Partition line 0,   7.499739E-36
Partition line 1,   -3.430092E-01
Partition line 2,   -1.019287E+00
Partition line 3,   -1.499808E+00
Partition line 4,   -1.817651E+00
Partition line 5,   -2.136372E+00
Partition line 6,   -2.448321E+00
Partition line 7,   -2.674414E+00
Partition line 8,   -2.956737E+00
Partition line 9,   -3.457806E+00
Partition line 10,  -3.995106E+00
"""

arr = str.split()
for index in enumerate(arr):
    print index  # just to see the list
start = 59  # from this position the numbers begin
step = 4    # current number is each fourth
ar = []
for j in range(start, len(arr), step):
    ar.append(arr[j])

floatAr = []
# or you can use this expression instead of the following loop 
# floatAr = [float(x) for x in ar]
for n in range(len(ar)):
    floatAr.append(float(ar[n]))

print floatAr

最后,您将收到一个名为 floatAr 的列表,其中包含所有浮点值。您可以添加try- except block 以获得更好的可用性。

或者,如果您想使用正则表达式,这里是代码:

<!--language:python -->

str = """   ADDITIONAL DATA

Tip Rotation (degrees)
Node    , UR[x] , UR[y] , UR[z]
21  , 1.0744    , 1.2389    , -4.3271
22  , -1.0744   , -1.2389   , -4.3271
53  , 0.9670    , 1.0307    , -3.8990
54  , -0.0000   , -0.0000   , -3.5232
55  , -0.9670   , -1.0307   , -3.8990

Mean rotation variation along blade
Region  , Rotation (degrees)
Partition line 0,   7.499739E-36
Partition line 1,   -3.430092E-01
Partition line 2,   -1.019287E+00
Partition line 3,   -1.499808E+00
Partition line 4,   -1.817651E+00
Partition line 5,   -2.136372E+00
Partition line 6,   -2.448321E+00
Partition line 7,   -2.674414E+00
Partition line 8,   -2.956737E+00
Partition line 9,   -3.457806E+00
Partition line 10,  -3.995106E+00"""

regex = '\s-?[1-9]+[0-9]*.?[0-9]*E-?\+?[0-9]+\s?'

import re

values = re.findall(regex, str)
floatAr = [float(x) for x in values]
print floatAr

顺便说一句,这是一个很好的 python 在线正则表达式检查器 pythex

关于python - 在python中使用正则表达式提取科学记数法数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19546631/

相关文章:

python - QRegExp 和 QSyntaxHighlighter 的单引号文本

将函数应用于链表的 Pythonic 方式

python - 如何更改或禁用 python 的核心转储的路径?

python - 在 GridSearchCV 的管道中交替使用不同的模型

python - re.search()和群组无法以与正常格式相同的文件格式工作。 python

python - 在 python 中导入 vim 返回错误

c++ - QRegularExpression 是否删除了反向引用(因此搜索速度更快)

c++ - Qt C++ : Line edit accept only alphanumeric characters, 破折号和下划线

python - QRegExp 和 QSyntaxHighlighter 的单引号文本