方法1
def getAndBuildDatafrmeFromCsvBasic(filename):
colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'])
return dfEurUsd2017
方法2
def getAndBuildDatafrmeFromCsv(filename):
df = pd.read_csv(filename, header=None)
df.columns = ['date', 'Open', 'High', 'Low', 'Close', 'Volume']
df.date = pd.to_datetime(df.date, format='%d.%m.%Y %H:%M:%S.%f')
df.index = df['date']
df = df[['Open', 'High', 'Low', 'Close', 'Volume']]
return df
结果方法1
Open High Low Close Volume
Gmt time
2017-12-04 23:00:00 1.06672 1.06699 1.06636 1.06698 1889.56
结果方法2
Open High Low Close Volume
Gmt time
2017-12-04 23:00:00 1.18686 1.18699 1.18666 1.18682 2004.46
为什么方法 1 会错误地解析开盘价、最高价、最低价、收盘价、成交量的值? 方法 2 产生正确的输出。我担心为什么两种方法输出完全不同的数值,甚至音量不同。但 csv 文件是相同的。
来自 CSV 的行
04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003
最佳答案
我添加了 dayfirst=True 并且您的代码工作正常。
你使用的pandas版本是什么?这些虚假数据从何而来?
import pandas as pd
data = '''\
Gmt time,Open,High,Low,Close,Volume
04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003
'''
with open('test.csv', 'w') as f:
f.write(data)
def getAndBuildDatafrmeFromCsvBasic(filename):
colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'], dayfirst=True)
return dfEurUsd2017
print(getAndBuildDatafrmeFromCsvBasic('test.csv'))
返回:
Open High Low Close Volume
Gmt time
2017-12-04 23:00:00 1.18686 1.18699 1.18666 1.18682 2004.46
2017-12-04 23:30:00 1.18682 1.18706 1.18652 1.18681 1242.68
2017-12-05 00:00:00 1.18681 1.18691 1.18639 1.18653 2666.81
2017-12-05 00:30:00 1.18653 1.18726 1.18650 1.18709 3567.24
2017-12-05 01:00:00 1.18708 1.18750 1.18707 1.18738 3105.47
2017-12-05 01:30:00 1.18738 1.18744 1.18691 1.18732 3561.50
2017-12-05 02:00:00 1.18732 1.18766 1.18704 1.18740 2706.64
关于python - Pandas Read_CSV 错误地读取数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48104001/