python pd.read 在运行 rodeo 时标记错误

标签 python pandas

我查看了python文档,到目前为止这段代码没有任何问题。但是,当我在 Rodeo IDE 上运行此代码时,它会将其标记为错误,有人可以指导我正确的方向或发布解决此问题的答案。

我的代码:

#%matplotlib inline
import pandas as pd

import numpy as np
import matplotlib.pyplot as plt

#pd.set_option("display.max_rows", 16)

#LARGE_FIGSIZE = (12, 8)

#C:\\Users\\User\\Documents\\dataScience\\pandas_tutorial\\climate_timeseries\\data\\temperatures\\xxxx.txt"

#https://github.com/jonathanrocher/pandas_tutorial/blob/master/climate_timeseries/data/temperatures/GLB.Ts+dSST.txt

giss_temp = pd.read_table("C:\\Users\\User\\Documents\\dataScience\\pandas_tutorial\\climate_timeseries\\data\\temperatures\\xxxx.txt",sep="\s+",skiprows=7,skipfooter=7,engine="python")

print(giss_temp)

我的错误消息:

ValueError: Expected 2 fields in line 160, saw 87
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-02b52eb96870> in <module>()
     11 #C:\\Users\\User\\Documents\\dataScience\\pandas_tutorial\\climate_timeseries\\data\\temperatures\\xxxx.txt"
     12 #https://github.com/jonathanrocher/pandas_tutorial/blob/master/climate_timeseries/data/temperatures/GLB.Ts+dSST.txt
---> 13 giss_temp = pd.read_table("https://github.com/jonathanrocher/pandas_tutorial/blob/master/climate_timeseries/data/temperatures/GLB.Ts+dSST.txt",sep="\s+",skiprows=7,skipfooter=7,engine="python")
     14 print(giss_temp)
C:\Users\User\AppData\Local\rodeo\app-2.5.2\resources\conda\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    643                     skip_blank_lines=skip_blank_lines)
    644 
--> 645         return _read(filepath_or_buffer, kwds)
    646 
    647     parser_f.__name__ = name
C:\Users\User\AppData\Local\rodeo\app-2.5.2\resources\conda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    398         return parser
    399 
--> 400     data = parser.read()
    401     parser.close()
    402     return data
C:\Users\User\AppData\Local\rodeo\app-2.5.2\resources\conda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
    936                 raise ValueError('skipfooter not supported for iteration')
    937 
--> 938         ret = self._engine.read(nrows)
    939 
    940         if self.options.get('as_recarray'):
C:\Users\User\AppData\Local\rodeo\app-2.5.2\resources\conda\lib\site-packages\pandas\io\parsers.py in read(self, rows)
   1990             content = content[1:]
   1991 
-> 1992         alldata = self._rows_to_cols(content)
   1993         data = self._exclude_implicit_index(alldata)
   1994 
C:\Users\User\AppData\Local\rodeo\app-2.5.2\resources\conda\lib\site-packages\pandas\io\parsers.py in _rows_to_cols(self, content)
   2505             msg = ('Expected %d fields in line %d, saw %d' %
   2506                    (col_len, row_num + 1, zip_len))
-> 2507             raise ValueError(msg)
   2508 
   2509         if self.usecols:
ValueError: Expected 2 fields in line 160, saw 87
>>>  
ClearInterruptRestart

最佳答案

您应该使用原始文件 URL:

In [390]: pd.options.display.max_rows = 10

In [391]: url = 'https://raw.githubusercontent.com/jonathanrocher/pandas_tutorial/master/climate_timeseries/data/temperatures/GLB.Ts%2BdSST.txt'

In [392]: pd.read_csv(url, skiprows=7, delim_whitespace=True, skipfooter=12, error_bad_lines=False, engine='python')
Out[392]:
     Year  Jan  Feb  Mar  Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec   J-D  D-N   DJF   MAM   JJA   SON Year.1
0    1880  -34  -27  -22  -30   -16   -24   -19   -12   -20   -19   -16   -21   -22  ***  ****   -23   -18   -18   1880
1    1881  -13  -16   -2   -3    -3   -27   -12    -8   -18   -23   -28   -18   -14  -14   -17    -3   -15   -23   1881
2    1882    3    4   -2  -24   -20   -32   -27   -11   -11   -25   -25   -37   -17  -16    -4   -15   -23   -20   1882
3    1883  -38  -38  -12  -20   -20    -8    -3   -13   -19   -19   -28   -21   -20  -21   -38   -18    -8   -22   1883
4    1884  -20  -14  -31  -36   -33   -36   -31   -24   -29   -25   -29   -25   -28  -28   -18   -33   -31   -28   1884
..    ...  ...  ...  ...  ...   ...   ...   ...   ...   ...   ...   ...   ...   ...  ...   ...   ...   ...   ...    ...
137  2011   45   44   57   60    47    54    70    69    52    60    50    48    55   55    45    55    64    54   2011
138  2012   38   43   52   62    71    59    50    56    68    73    69    46    57   57    43    62    55    70   2012
139  2013   62   52   60   48    56    61    53    61    73    61    75    61    60   59    53    55    58    70   2013
140  2014   68   44   71   72    79    62    50    74    81    78    64    74    68   67    58    74    62    74   2014
141  2015   75   80   84   71  ****  ****  ****  ****  ****  ****  ****  ****  ****  ***    76  ****  ****  ****   2015

[142 rows x 20 columns]

关于python pd.read 在运行 rodeo 时标记错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48254732/

相关文章:

python - 在 pandas DataFrame 中计算 h-index(作者出版物的影响/生产力)的有效方法

python - 使用 str.extract 方法在 Pandas 中匹配子字符串

python - 根据另一列查找公共(public)列值

python - 从 pandas 网站读取大型数据集仅返回 1.000 行?

python - 学习 : Cross validation for grouped data

python - 如何在同一网络服务器上使用多个 django WSGI 进程 + celery 进行日志记录

python - 如何使用gzip模块打开csv文件

python - 强制所有行在条件后取值

c++ - 将python代码转换为c++

python - 在 1 年内绘制 DataFrame