我正在尝试解压缩一个 csv 文件并将其传递给 pandas,以便我可以处理该文件。
到目前为止我尝试过的代码是:
import requests, zipfile, StringIO
r = requests.get('http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip')
z = zipfile.ZipFile(StringIO.StringIO(r.content))
crime2013 = pandas.read_csv(z.read('crime_incidents_2013_CSV.csv'))
在最后一行之后,虽然python能够获取文件,但我在错误的末尾得到一个“不存在”。
谁能告诉我我做错了什么?
最佳答案
如果您想将压缩文件或 tar.gz 文件读入 pandas 数据帧,read_csv
方法包含此特定实现。
df = pd.read_csv('filename.zip')
或长格式:
df = pd.read_csv('filename.zip', compression='zip', header=0, sep=',', quotechar='"')
docs 中压缩参数的描述:
compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.
关于python - 将压缩文件读取为 pandas DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18885175/