python - 我应该如何在数据框中找到也包含 Null 值的数字列？

数据框看起来像:

          col1  col2   col3    col4    col5    col6    col7
points                                                    
x1         0.6  '0'   'first'  0.93   'lion'   0.34   0.98
x2         0.7  '1'  'second'  0.47    'cat'   0.43   0.76
x3         NaN  '0'   'third'  0.87  'tiger'   0.24   0.10
x4         0.6  '0'   'first'  0.93   'lion'   0.34   0.98
x5         0.5  '1'   'first'  0.32     NaN    0.09   NaN
x6         0.4  '0'   'third'  0.78  'tiger'   0.18   0.17
x7         0.5  '1'  'second'  0.98    'cat'   0.47   0.78 

numeric=df.select_dtypes(include=["number"])
others=df.select_dtypes(exclude=["number"])
print(numeric)

output:
          col4   col6
points                                                    
x1        0.93   0.34
x2        0.47   0.43   
x3        0.87   0.24   
x4        0.93   0.34   
x5        0.32   0.09   
x6        0.78   0.18   
x7        0.98   0.47

但我需要这样的输出:

          col1  col4    col6    col7
points                                                    
x1         0.6  0.93    0.34   0.98
x2         0.7  0.47    0.43   0.76
x3         NaN  0.87    0.24   0.10
x4         0.6  0.93    0.34   0.98
x5         0.5  0.32    0.09   NaN
x6         0.4  0.78    0.18   0.17
x7         0.5  0.98    0.47   0.78

我知道 NaN 被视为对象，并且这些列被移动到 others。如何根据列中的值检测列？

最佳答案

您的问题归结为:

How can I convert columns which are meant to be numeric but currently have object dtype.

解决此问题后，pd.DataFrame.select_dtypes将按需要工作。这意味着您事先不知道哪个系列是数字的。但是您可以做的是尝试并将列转换为当前具有object dtype 的数字。如果您发现任何非 null 值，您可以应用转换。

for col in df.select_dtypes(include=['object']):
    s = pd.to_numeric(df[col], errors='coerce')
    if s.notnull().any():
        df[col] = s

print(df.dtypes)

points     object
col1      float64
col2       object
col3       object
col4      float64
col5       object
col6      float64
col7      float64
dtype: object

逻辑将适用于您提供的数据。它不会工作，例如，当你有一系列主要是字符串和一些数字时。在这种情况下，您将需要定义更精确的逻辑来确定应将哪个系列视为数字。

关于python - 我应该如何在数据框中找到也包含 Null 值的数字列？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52568152/

python - 我应该如何在数据框中找到也包含 Null 值的数字列？

上一篇：python - 带有 osx 后端的 Matplotlib 3.0

下一篇：python - 为多个 celery worker 和线程正确设置 Flask-SQLAlchemy