Python 任务 : Searching for a value in a column and get the value of a different column

df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})

想象一下现在使用 pandas 的数据框，我很容易根据另一列的值找到一列，就像这样:

df.loc[df["B"] == "three", "A"]

但是对于 dask，如果我使用相同的代码，我收到的输出并没有真正帮助我

df.loc[df["ActionGeo_Lat"] == "42#.5", "SQLDATE"]

执行此行后，我收到以下输出，这对我没有帮助:

Output after executing my code

我遇到的问题是每次我尝试执行 df.compute 我都会收到

ValueError:ValueError: could not convert string to float: '42#.5'.

删掉一些列后发现是ActionGeo_Lat列某处出错，现在想手动编辑csv文件来解决错误，但找不到是哪个错误发生的日期。

提前感谢您的帮助!

最佳答案

看起来您的根本问题在于数据的加载/键入。这是一个示例，显示相同的 pandas 语法在 dask 数据帧上可以正常工作:

import pandas as pd
import numpy as np
import dask.dataframe as dd

df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
ddf = dd.from_pandas(df, npartitions=2)

print(df.loc[df['B'] == "three", "A"])
print(ddf.loc[ddf['B'] == "three", "A"].compute())

dask.dataframe 不是调试 csv 文件的好方法，所以最好的方法是使用 shell/bash 实用程序来编辑文件，例如

grep -ai "42#.5" your_file_name_here.csv

关于Python 任务 : Searching for a value in a column and get the value of a different column，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65825948/

Python 任务 : Searching for a value in a column and get the value of a different column

上一篇：docker - 无法停止 docker 容器

下一篇：system-verilog - 使用 typedef 随机化结构