python - 用 pandas 打开压缩的 excel 文件

标签 python pandas python-3.6 zip

我正在尝试使用 pandas 打开压缩的 Excel 文件

当我尝试时

import pandas as pd
import zipfile
from urllib.request import urlopen
import io

url = 'https://www.cftc.gov/files/dea/history/fut_disagg_xls_2020.zip'
file =zipfile.ZipFile((io.BytesIO(urlopen(url).read())))
file_name = file.namelist()[0]
pd.read_excel(file.open(file_name))

我收到 UnsupportedOperation:eek 错误。有什么想法如何阅读这个文件吗?

编辑

这是跟踪:

UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-1-874d52ab10ad> in <module>
      7 file =zipfile.ZipFile((io.BytesIO(urlopen(url).read())))
      8 file_name = file.namelist()[0]
----> 9 pd.read_excel(file.open(file_name))

~/anaconda3/envs/p/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

~/anaconda3/envs/p/lib/python3.6/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

~/anaconda3/envs/p/lib/python3.6/site-packages/pandas/io/excel/_base.py in __init__(self, path_or_buffer, engine)
    849             engine = "xlrd"
    850             if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)):
--> 851                 if _is_ods_stream(path_or_buffer):
    852                     engine = "odf"
    853             else:

~/anaconda3/envs/p/lib/python3.6/site-packages/pandas/io/excel/_base.py in _is_ods_stream(stream)
    798         Boolean indication that this is indeed an ODS file or not
    799     """
--> 800     stream.seek(0)
    801     is_ods = False
    802     if stream.read(4) == b"PK\003\004":

UnsupportedOperation: seek

最佳答案

问题出在 Python 的版本上。在Python3.8 中,脚本按原样工作。在Python3.6中,在pd.read_excel()中添加.read():

import pandas as pd
import zipfile
from urllib.request import urlopen
import io

url = 'https://www.cftc.gov/files/dea/history/fut_disagg_xls_2020.zip'
file =zipfile.ZipFile((io.BytesIO(urlopen(url).read())))
file_name = file.namelist()[0]
df = pd.read_excel(file.open(file_name).read())  # <-- add .read()
print(df)

打印:

                              Market_and_Exchange_Names  As_of_Date_In_Form_YYMMDD Report_Date_as_MM_DD_YYYY  ...                Contract_Units CFTC_SubGroup_Code  FutOnly_or_Combined
0                    WHEAT-SRW - CHICAGO BOARD OF TRADE                     201013                2020-10-13  ...  (CONTRACTS OF 5,000 BUSHELS)                A10              FutOnly
1                    WHEAT-SRW - CHICAGO BOARD OF TRADE                     201006                2020-10-06  ...  (CONTRACTS OF 5,000 BUSHELS)                A10              FutOnly
2                    WHEAT-SRW - CHICAGO BOARD OF TRADE                     200929                2020-09-29  ...  (CONTRACTS OF 5,000 BUSHELS)                A10              FutOnly
3                    WHEAT-SRW - CHICAGO BOARD OF TRADE                     200922                2020-09-22  ...  (CONTRACTS OF 5,000 BUSHELS)                A10              FutOnly
4                    WHEAT-SRW - CHICAGO BOARD OF TRADE                     200915                2020-09-15  ...  (CONTRACTS OF 5,000 BUSHELS)                A10              FutOnly
...                                                 ...                        ...                       ...  ...                           ...                ...                  ...
8475  MINI JAPAN C&F NAPHTHA - NEW YORK MERCANTILE E...                     200901                2020-09-01  ...             (100 METRIC TONS)                N10              FutOnly
8476  MINI JAPAN C&F NAPHTHA - NEW YORK MERCANTILE E...                     200825                2020-08-25  ...             (100 METRIC TONS)                N10              FutOnly
8477  MINI JAPAN C&F NAPHTHA - NEW YORK MERCANTILE E...                     200818                2020-08-18  ...             (100 METRIC TONS)                N10              FutOnly
8478  MINI JAPAN C&F NAPHTHA - NEW YORK MERCANTILE E...                     200811                2020-08-11  ...             (100 METRIC TONS)                N10              FutOnly
8479  MINI JAPAN C&F NAPHTHA - NEW YORK MERCANTILE E...                     200728                2020-07-28  ...             (100 METRIC TONS)                N10              FutOnly

[8480 rows x 188 columns]

关于python - 用 pandas 打开压缩的 excel 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64426954/

相关文章:

python - 遍历 *args 重置计数器

python - python pandas 如果在过去 N 天内满足条件则创建一个新列

字符串连接上的 Python 3.6 与 3.5 TypeError 消息

python - 下拉排序

python - pd.to_csv 用列表设置 float_format

python - 如何使用删除 NA 值的选项来融化 Pandas 中的数据框

python - 精炼Python代码以供使用(跳过重复步骤)

python - 确定是否可以从类或实例调用可调用对象

opencv - 无法访问网络摄像头 OpenCV 3.3 Python 3

python - Scrapy - 抓取简单网站的问题