python - 警告 : multiple data types in column of very large dataframe

我从 csv 中读取了一个相当大的 pandas DataFrame(约 300 万行和 72 列)，我收到一些列包含混合数据类型的警告:

DtypeWarning: Columns (1,2,3,15,16,17,18,19,20,21,22,23,31,32,33,35,37,38,39,40,41,42,43,44,45,46,47,48,50,51,52,55,57,58,60,71) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

考虑到我不能只盯着 csv，处理这个问题的最佳方法是什么？特别是，有没有办法获取给定列中出现的所有数据类型及其对应行号的列表？

最佳答案

考虑以下df

df = pd.DataFrame(dict(col1=[1, '1', False, np.nan, ['hello']],
                       col2=[2, 3.14, 'hello', (1, 2, 3), True]))
df = pd.concat([df for _ in range(2)], ignore_index=True)

df

您可以调查不同的类型以及其中有多少

df.col1.apply(type).value_counts()

<type 'float'>    2
<type 'int'>      2
<type 'list'>     2
<type 'bool'>     2
<type 'str'>      2
Name: col1, dtype: int64

你可以像这样调查 col1 的哪些行是 float 的

df[df.col1.apply(type) == float]

关于python - 警告 : multiple data types in column of very large dataframe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38964819/

上一篇：python - 如何从 PySpark 的 SQLite 数据库文件加载表？

下一篇：python - python 3中的IP欺骗

相关文章：

python - 尝试使用 Pandas 计算百分比并添加新列

python - 使用 Pandas 计算头对头统计数据

Python Pandas : Group by one column and see the content of all columns?

python - AWS-EMR 错误退出代码 143

python - 过滤掉 Panda Dataframe 中的字符串

python - Pandas - 使用 groupby sum 和 where 子句创建新列

Python Pandas : Get index of rows where column matches certain value

python - 如何创建我的 DataFrame 以仅在我的数据集的 'Language' 列中显示法国电影，而该列中有多种语言？

python 求和函数 - 需要 `start` 参数说明

python - 如何从 python 向 bash 历史记录添加命令