python - Pandas read_excel : nan values forcing others in the same column to be converted to float

假设我要读取以下 Excel 文件:

我想要的是一个简单的解决方案(最好是一行)，它可以读取 excel，以便将日期转换为 str(或至少 int), 空白值是 nan 或 nat 或任何可以被 pd.isnull 检测到的值。

如果我使用df = pd.read_excel(file_path)，我得到的是

df
Out[8]: 
              001002.XY  600123.AB  123456.YZ   555555.GO
ipo_date     20100203.0   20150605        NaN  20090501.0
delist_date         NaN   20170801        NaN         NaN

因此 pandas 将空白单元格识别为 NaN，这很好，但让人恼火的是所有其他值都被强制为 float64，即使它们是为了只是 str 或 int。 (编辑:似乎如果列，例如 [1] 列没有 nan，那么其他值不会被迫 float 。但是，在我的例子中，大多数列的 delist_date 都是空白的，因为大多数股票都有 ipo 日期，但尚未退市。)

据我所知，我尝试了 dtype=str 关键字 arg，它给了我

df
Out[10]: 
            001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date     20100203  20150605       nan  20090501
delist_date       nan  20170801       nan       nan

好看吗？没错，日期现在是 str，但有一点很荒谬，nan 现在变成了文字字符串!例如

df.iloc[1, 0]
Out[12]: 
'nan'

这会让我不得不在稍后添加一些奇怪的东西，比如 df.replace。

我没有尝试使用转换器，因为它需要逐列指定数据类型，而我正在使用的实际 excel 文件是一个非常长的电子表格(大约 3k 列)。我也不想在 Excel 本身中转置电子表格。

有人能帮忙吗？提前致谢。

最佳答案

使用 dtype=object 作为参数。

这里有很好的解释:pandas distinction between str and object types

关于python - Pandas read_excel : nan values forcing others in the same column to be converted to float，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47234997/

上一篇：Python将zip写入文本文件

下一篇：Python 多处理池从第一个 block 的输入中吞下异常

相关文章：

excel - 根据多个条件查找或获取第二个较低的值

vba - Excel - 只在轴上显示 10 的倍数，无论最小值和最大值如何

vba - 如何在EXCEL中选择不同(非相邻)列中的最后三行

python - 根据Python中的多个条件将多个数据帧中的一列合并到另一个数据帧

python - 如何比较具有不同索引的两个数据帧并打印出重复的行？

python - 如何用打包版本替换从源安装的 Python？

python - Django 和 Ajax : How can I create a csv file download from posting data through AJAX to Django?

Pandas - Pivot/stack/unstack/melt

Python argparse 作为函数

python - 无法使用 Matplotlib 在 X 轴上显示时间