Python 嵌套循环处理数据文件

标签 python

我有一个包含数据的文件,我想搜索每小时的最大读数。

def maximum():
source = open ( 'dataV.csv', 'r' )
result = open ( 'dataV-max.csv', 'w' )
line = source.readline()

max_num = '0'

while line != '' :
    for time in range(0, 24):
        line = source.readline()
        if time == line [ 12:14 ] and line [22:len(line)] <= max_num :
            max_num = line [ 22:len(line) ]
            print ( max_num )
            result.write ( str(max_num) )

source.close()
result.close() 

好吧,我更新了代码,但它只保留了一小时。

def maximum():
source = open ( 'dataV.csv', 'r' )
result = open ( 'dataV-max.csv', 'w' )
line = source.readline()

line = source.readline()

max_hour = line[23:]
hour = line[12:14]

while line != '':
    hour = line[12:14]
    line = source.readline()
    if hour == line[12:14]:
        if line[23:] > max_hour:
            max_hour = line[23:]
        result.write(line)

source.close()
result.close()

我认为嵌套循环有问题。我不明白如何让它遍历整个文件。

这是文件的一部分:

'time PST', saturn03.820.A.AlgaeWatch [microg/l]
'2014-04-25 00:04:48',3.35
'2014-04-25 00:04:54',3.225
'2014-04-25 00:05:00',3.15
'2014-04-25 00:07:48',3.4
'2014-04-25 00:07:54',3.4
'2014-04-25 00:08:00',3.375
'2014-04-25 00:10:48',3.45
'2014-04-25 00:10:54',3.325
'2014-04-25 00:11:00',3.425
'2014-04-25 00:13:49',3.45
'2014-04-25 00:13:54',3.5
'2014-04-25 00:14:00',3.525
'2014-04-25 00:16:48',3.725

最佳答案

给定输入:

'time PST', saturn03.820.A.AlgaeWatch [microg/l]
'2014-04-25 00:04:48',3.35
'2014-04-25 00:04:54',3.225
'2014-04-25 00:05:00',3.15
'2014-04-25 00:07:48',3.4
'2014-04-25 00:07:54',3.4
'2014-04-25 00:08:00',3.375
'2014-04-25 00:10:48',3.45
'2014-04-25 00:10:54',3.325
'2014-04-25 00:11:00',3.425
'2014-04-25 00:13:49',3.45
'2014-04-25 00:13:54',3.5
'2014-04-25 01:14:00',3.525
'2014-04-25 02:16:48',3.725

程序:

#! /usr/bin/env python
"""Usually a ready made file parser like csv module or even panda
et al. for more complete service is the way to go here but one may
want to know how to basically iterate and parse a little one self.
This is also for the date time parsing which one typically also
delegates to datetime module or the like."""
from __future__ import print_function
import sys


def hourly_maxima(in_file, out_file):
    """Extract calendar hourly maximum readings from in_file,
    write to out_file. If files do not exist or are
    not accessible exceptions will happily raise ;-).
    Input is expected to be ordered ascending by time
    stamp."""

    field_sep = ','
    with open(in_file, 'rt') as f_i, open(
            out_file, 'wt') as f_o:  # May raise here
        f_i.readline()  # Ignore header, be optimistic

        ts_raw = None
        hourly_maximum = None
        current_hour = None  # Group by calendar hour stored in tuples
        date_sep = '-'
        # Expect sample data line to document flow:
        # '2014-04-25 00:04:48',3.35
        for line in f_i.readlines():  # Digest rest of lines
            if not line:
                break  # stop on first empty line
            ts, reading = line.strip().split(field_sep)  # May raise ...
            r_float = float(reading)  # May raise ...

            # Map timestamp ts to calendar hour
            ts_raw = ts.strip("'")
            year, month, day = ts_raw[:10].split(date_sep)
            hour = ts_raw[11:13]
            cand_hour = (year, month, day, hour)
            if current_hour is None:
                current_hour = cand_hour

            if cand_hour == current_hour:  # We seek the maximum
                if hourly_maximum is None or r_float > hourly_maximum:
                    hourly_maximum = r_float
            else:  # report hourly maximum of previous hour and reset
                print(ts_raw, hourly_maximum)  # Also report matching hour?
                f_o.write('%s\n' % (str(hourly_maximum)))
                current_hour = cand_hour
                hourly_maximum = r_float

        # Flush the last result kept in hourly_maximum:
        print(ts_raw, hourly_maximum)  # Also report matching hour?
        f_o.write('%s\n' % (str(hourly_maximum)))


def main():
    """Drive the extraction."""
    in_file = 'dataV.csv' if len(sys.argv) < 2 else sys.argv[1]
    out_file = 'dataV-max.csv' if len(sys.argv) < 3 else sys.argv[2]

    hourly_maxima(in_file, out_file)

if __name__ == '__main__':
    sys.exit(main())

产量:

2014-04-25 01:14:00 3.5
2014-04-25 02:16:48 3.525
2014-04-25 02:16:48 3.725

在标准输出和文件中:

3.5
3.525
3.725

这就是你想要的吗?大概吧。不过,还有很大的改进、强化和额外优雅的空间。

继续学习Python。

PS:抱歉,暂时离线。

关于Python 嵌套循环处理数据文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37606687/

相关文章:

python - 为什么这个 CountVectorizer 输出与我的字数统计不同?

python - Django 模型 : Get elements considering their presence as a foreign key in another table

python - 如何在 Python 中将两个元素写入一行

python - 从格式错误的 HTML 中获取列表数据

java - 使用正则表达式需要删除特定的特殊字符

具有先前值的 Python Pandas iterrows()

python - 是否可以在 Selenium Python 中选择仅具有类值的按钮?

python - 在 python/numpy 中拼接数组

python - 从 python 中具有固定数量元素的集合中进行非常快速的采样

python - 您可以获得 Google Cloud Composer/Airflow 的静态外部 IP 地址吗?