python - 使用 Python 的 pandas 从 TXT 文件中解析 DD MM YY HH MM SS 列

提前感谢大家的宝贵时间。我有许多格式为空格分隔的文本文件；

    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

格式为 DD MM YY HH MM SS 和我的结果值。我正在尝试使用 Python 的 pandas 读取 txt 文件。在发布这个问题之前，我已经尝试对此进行了大量研究，所以希望我没有涉及踩踏的地面。

基于反复试验和研究，我得出了:

    import pandas as pd
    from cStringIO import StringIO
    def parse_all_fields(day_col, month_col, year_col, hour_col, minute_col,second_col):
    day_col = _maybe_cast(day_col)
    month_col = _maybe_cast(month_col)
    year_col = _maybe_cast(year_col)
    hour_col = _maybe_cast(hour_col)
    minute_col = _maybe_cast(minute_col)
    second_col = _maybe_cast(second_col)
    return lib.try_parse_datetime_components(day_col, month_col, year_col, hour_col, minute_col, second_col)
    ##Read the .txt file
    data1 = pd.read_table('0132_3.TXT', sep='\s+', names=['Day','Month','Year','Hour','Min','Sec','Value'])
    data1[:10]

    Out[21]: 

    Day,Month,Year,Hour, Min, Sec, Value
    29 04 13 18 15 00    7.667
    29 04 13 18 30 00    7.000
    29 04 13 18 45 00    7.000
    29 04 13 19 00 00    7.333
    29 04 13 19 15 00    7.000

    data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

    TypeError                                 Traceback (most recent call last)
    <ipython-input-22-8ee408dc19c3> in <module>()
    ----> 1 data2 = pd.read_table(StringIO(data1), parse_dates={'datetime':   ['Day','Month','Year','Hour''Min','Sec']}, date_parser=parse_all_fields, dayfirst=True)

    TypeError: expected read buffer, DataFrame found

此时我被卡住了。首先，预期的读取缓冲区错误让我感到困惑。我是否需要对 .txt 文件进行更多预处理才能将日期转换为可读格式？注意 - read_table 的 parse_function 在此日期格式上无法单独使用。

我是初学者 - 正在努力学习。抱歉，如果代码错误/基本/令人困惑。如果有人可以提供帮助，将不胜感激。非常感谢。

最佳答案

我认为在阅读 csv 时解析日期会更容易:

In [1]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]])

In [2]: df
Out[2]:
                    0      1
0 2013-04-29 00:00:00  7.667
1 2013-04-29 00:00:00  7.000
2 2013-04-29 00:00:00  7.000
3 2013-04-29 00:00:00  7.333
4 2013-04-29 00:00:00  7.000

由于您使用的是不寻常的日期格式，因此您还需要指定一个日期解析器:

In [11]: def date_parser(ss):
             day, month, year, hour, min, sec = ss.split()
             return pd.Timestamp('20%s-%s-%s %s:%s:%s' % (year, month, day, hour, min, sec))

In [12]: df = pd.read_csv('0132_3.TXT', header=None, sep='\s+\s', parse_dates=[[0]], date_parser=date_parser)

In [13]: df
Out[13]:
                    0      1
0 2013-04-29 18:15:00  7.667
1 2013-04-29 18:30:00  7.000
2 2013-04-29 18:45:00  7.000
3 2013-04-29 19:00:00  7.333
4 2013-04-29 19:15:00  7.000

关于python - 使用 Python 的 pandas 从 TXT 文件中解析 DD MM YY HH MM SS 列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17301589/

python - 使用 Python 的 pandas 从 TXT 文件中解析 DD MM YY HH MM SS 列

上一篇：python - 从字符串列表创建新的子字符串列表

下一篇：python - 如何使用 ctypes 访问返回在 Delphi dll 中编码的自定义类型的函数？