python - 数据处理错误可以通过复制粘贴解决吗?

标签 python csv matplotlib graph

我在Linux 16.04下使用python 2.7处理数据时遇到了一个非常奇怪的问题。 我使用此函数创建一个 .csv 文件:

from ast import literal_eval
    with open('logs.csv') as f:
    data = [literal_eval(line) for line in f]

文件已正确创建,如下所示:

('2017-04-01 12:05:00','0.01770001','0.0177887','0.01780275','0.01770001')
('2017-04-01 12:10:00','0.0177887','0.01771308','0.01785263','0.01771039')
('2017-04-01 12:15:00','0.01773','0.01780092','0.01780092','0.01773')
('2017-04-01 12:20:00','0.0178','0.01781212','0.01784922','0.01774015')
('2017-04-01 12:25:00','0.01781212','0.01774528','0.01782994','0.01774528')
('2017-04-01 12:30:00','0.01774529','0.0178732','0.01788145','0.01774509')
('2017-04-01 12:35:00','0.01788145','0.01793318','0.01793318','0.01788145')
('2017-04-01 12:40:00','0.01794','0.01780093','0.01799984','0.01780092')
('2017-04-01 12:45:00','0.01785694','0.01806699','0.01807519','0.01785694')
('2017-04-01 12:50:00','0.01807999','0.01819687','0.01827573','0.018027')
('2017-04-01 12:55:00','0.01819687','0.01825402','0.0184','0.01800011')
('2017-04-01 13:00:00','0.01822416','0.01830994','0.01835554','0.0181777')
('2017-04-01 13:05:00','0.01825415','0.01810171','0.01830986','0.01810008')
('2017-04-01 13:10:00','0.01810174','0.01818991','0.01818991','0.01810173')
('2017-04-01 13:15:00','0.01818991','0.01818002','0.01819687','0.01818001')
('2017-04-01 13:20:00','0.01818002','0.01821999','0.01822','0.01818001')

然后我通过此代码传递它来绘制图表:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import dates, ticker
import matplotlib as mpl
from mpl_finance import candlestick_ohlc
from ast import literal_eval

mpl.style.use('default')


data = []
ohlc_data = [] 

with open('logsXMR.csv') as f:
    data = [literal_eval(line) for line in f]


for line in data:
        #ohlc_data.append((np.float64(line[0]), np.float64(line[1]), np.float64(line[2]), np.float64(line[3]), np.float64(line[4])))
        ohlc_data.append((dates.datestr2num(line[0]), np.float64(line[1]), np.float64(line[2]), np.float64(line[3]), np.float64(line[4])))

fig, ax1 = plt.subplots()
candlestick_ohlc(ax1, ohlc_data, width = 0.5/((24*60)/5), colorup = 'g', colordown = 'r', alpha = 0.8)

#ax1.xaxis.set_major_formatter(dates.DateFormatter('%d/%m/%Y %H:%M'))
ax1.xaxis.set_major_locator(ticker.MaxNLocator(10))

plt.xticks(rotation = 30)
plt.grid()
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Historical Data XMRUSD')
plt.tight_layout()
plt.show()

但是每次我收到这个错误时:

Traceback (most recent call last):
  File "CSVing.py", line 15, in <module>
    data = [literal_eval(line) for line in f]
  File "/usr/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 2
    ('2017-04-01 12:10:00','0.0177887','0.01771308','0.01785263','0.01771039')
    ^

我不明白为什么会出现此错误,因为如果我只是将数据复制并粘贴到另一个文件中,一切都会正常工作,我可以完美地绘制图表。我只是不明白,因为这两个数据文件是相同的,没有添加空间或任何东西。

什么会导致此错误?如何继续能够直接使用生成的数据文件,而不需要将数据复制粘贴到另一个文件中?

提前致谢,

像素

最佳答案

我建议重新考虑您所拥有的数据格式。我不知道数据来自哪里,但以不包含括号等的方式存储它是合理的。

如果您确实需要使用这种数据格式,您仍然可以使用例如pandas 并通过删除无用的字符来清理格式。

u = """('2017-04-01 12:05:00','0.01770001','0.0177887','0.01780275','0.01770001')
('2017-04-01 12:10:00','0.0177887','0.01771308','0.01785263','0.01771039')
('2017-04-01 12:15:00','0.01773','0.01780092','0.01780092','0.01773')
('2017-04-01 12:20:00','0.0178','0.01781212','0.01784922','0.01774015')
('2017-04-01 12:25:00','0.01781212','0.01774528','0.01782994','0.01774528')
('2017-04-01 12:30:00','0.01774529','0.0178732','0.01788145','0.01774509')
('2017-04-01 12:35:00','0.01788145','0.01793318','0.01793318','0.01788145')
('2017-04-01 12:40:00','0.01794','0.01780093','0.01799984','0.01780092')
('2017-04-01 12:45:00','0.01785694','0.01806699','0.01807519','0.01785694')
('2017-04-01 12:50:00','0.01807999','0.01819687','0.01827573','0.018027')
('2017-04-01 12:55:00','0.01819687','0.01825402','0.0184','0.01800011')
('2017-04-01 13:00:00','0.01822416','0.01830994','0.01835554','0.0181777')
('2017-04-01 13:05:00','0.01825415','0.01810171','0.01830986','0.01810008')
('2017-04-01 13:10:00','0.01810174','0.01818991','0.01818991','0.01810173')
('2017-04-01 13:15:00','0.01818991','0.01818002','0.01819687','0.01818001')
('2017-04-01 13:20:00','0.01818002','0.01821999','0.01822','0.01818001')"""

import io
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
from mpl_finance import candlestick_ohlc

replace = {"\(" : "", "\)" : "", "'" : ""}
df = pd.read_csv(io.StringIO(u), sep=",",  header=None).replace(replace, regex=True)
# use pd.read_csv("myfilename.txt", ...)  here for your real file

df[0] = dates.datestr2num(df[0])
df.iloc[:,1:] = df.iloc[:,1:].astype(float)

fig, ax1 = plt.subplots()
candlestick_ohlc(ax1, df.values, width = 0.5/((24*60)/5), 
                 colorup = 'g', colordown = 'r', alpha = 0.8)

ax1.xaxis.set_major_formatter(dates.DateFormatter('%d/%m/%Y %H:%M'))
ax1.xaxis.set_major_locator(dates.MinuteLocator((0,15,30,45)))

plt.xticks(rotation = 30)
plt.grid()
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Historical Data XMRUSD')
plt.tight_layout()
plt.show()

enter image description here

请注意,数据似乎也不是 Ohlc 格式,因此图表看起来很奇怪。但由于对数据一无所知,您需要自己找出正确的顺序。

关于python - 数据处理错误可以通过复制粘贴解决吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53373652/

相关文章:

python - Mechanize :为什么我的表单列表只包含 1 个元素?

python - 如何删除 Django 中每个列表值之后的 "L"

python - 安装nolearn(python机器学习)导致错误

python - 在 Python 中缩放正态分布

python - 选择 for 循环中的前 n 个列表项

python - 使用 trueskill 算法,给定两个玩家的评分,我如何计算赢/输概率?

node.js - 如何将csv/xlsx文件上传到mongodb?

python - 使用python将带逗号的文本写入CSV文件中的单元格

R:来自 sqldf 的 read.csv.sql 能够成功读取一个 csv,但不能成功读取另一个 csv

python - 更新 matplotlib streamplot 的 U V 数据