python - Pandas Read_CSV 错误地读取数字

标签 python python-3.x python-2.7 pandas

方法1

def getAndBuildDatafrmeFromCsvBasic(filename):
    colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
    dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'])
    return dfEurUsd2017

方法2

def getAndBuildDatafrmeFromCsv(filename):
    df = pd.read_csv(filename, header=None)
    df.columns = ['date', 'Open', 'High', 'Low', 'Close', 'Volume']
    df.date = pd.to_datetime(df.date, format='%d.%m.%Y %H:%M:%S.%f')
    df.index = df['date']
    df = df[['Open', 'High', 'Low', 'Close', 'Volume']]
    return df

结果方法1

                        Open     High      Low    Close   Volume
Gmt time                                                        
2017-12-04 23:00:00  1.06672  1.06699  1.06636  1.06698  1889.56

结果方法2

                        Open     High      Low    Close   Volume
Gmt time                                                        
2017-12-04 23:00:00  1.18686  1.18699  1.18666  1.18682  2004.46

为什么方法 1 会错误地解析开盘价、最高价、最低价、收盘价、成交量的值? 方法 2 产生正确的输出。我担心为什么两种方法输出完全不同的数值,甚至音量不同。但 csv 文件是相同的。

来自 CSV 的行

04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003

最佳答案

我添加了 dayfirst=True 并且您的代码工作正常。

你使用的pandas版本是什么?这些虚假数据从何而来?

import pandas as pd

data = '''\
Gmt time,Open,High,Low,Close,Volume
04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003
'''

with open('test.csv', 'w') as f:
    f.write(data)

def getAndBuildDatafrmeFromCsvBasic(filename):
    colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
    dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'], dayfirst=True)
    return dfEurUsd2017

print(getAndBuildDatafrmeFromCsvBasic('test.csv'))

返回:

                        Open     High      Low    Close   Volume
Gmt time                                                        
2017-12-04 23:00:00  1.18686  1.18699  1.18666  1.18682  2004.46
2017-12-04 23:30:00  1.18682  1.18706  1.18652  1.18681  1242.68
2017-12-05 00:00:00  1.18681  1.18691  1.18639  1.18653  2666.81
2017-12-05 00:30:00  1.18653  1.18726  1.18650  1.18709  3567.24
2017-12-05 01:00:00  1.18708  1.18750  1.18707  1.18738  3105.47
2017-12-05 01:30:00  1.18738  1.18744  1.18691  1.18732  3561.50
2017-12-05 02:00:00  1.18732  1.18766  1.18704  1.18740  2706.64

关于python - Pandas Read_CSV 错误地读取数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48104001/

相关文章:

python-2.7 - Python蒙太奇在OpenCV图像上的情节并录制为视频

python - 导入类实例化

Python导入模块依赖

python - 使用python和pandas制作MIS折线图

python - 如何在 Python 字典中转换包含以下输出的表?

python - 使用带字节数的 textwrap.wrap

python - 通过 Python 访问 eBay 开发人员的 API?

python - 如何使用 Pandas 获取两个日期之间的天数

python - 如何仅抓取特定单词

Python编码——如何在列表的列表中找到特定元素的踪迹?