python - Python 中 R 的 read.table 等价物

标签 python r read.table

我正在尝试将我的一些处理工作从 R 转移到 Python。在 R 中,我使用 read.table() 来读取非常困惑的 CSV 文件,它会自动以正确的格式拆分记录。例如

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"

被正确地分成 4 列。 1 条记录可以分成多行,并且到处都是逗号。在 R 中,我只是这样做:

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

Python 中有什么东西可以同样出色地做到这一点吗?

谢谢!

最佳答案

您可以使用 csv 模块。

from csv import reader
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"")

for row in csv_reader:
    print row

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

输出长度= 4

关于python - Python 中 R 的 read.table 等价物,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19535840/

相关文章:

python - 为什么在调用 impala.dbapi.connect() 时出现“"TypeError: ' 模块对象不可调用”?

python - 根据其他列值从 DataFrame 获取值 (PySpark)

以字符串形式读取数字

r - 使用 colClasses 时 read.table 出错

python - 我如何从 Python C 代码断言?

python - 如何调试由第三方应用程序启动的 python 脚本

r - 绘制自定义垂直线直至曲线

r - 将年和月 ("yyyy-mm"格式)转换为日期?

r - 从 R 中的表读取自定义日期时间时出错

r - read.csv与read.table