python - Pandas 中的字符串清理问题

我有一个 pandas 列，其中包含用引号、括号或什么都不包围的单词行，如下所示:

"cxxx"
[asdfasd]
asdfasdf
[asdf]
"asdf"

我的问题是，下面的代码从没有引号或括号的元素中删除第一个和最后一个字符，我不确定为什么。

def keyword_cleanup(x):
    if "\"" or "[" in x:
        return x[1:-1]
    else:
        return x


csv["Keyword"] = csv["Keyword"].apply(keyword_cleanup)

最佳答案

if "\"" or "[" in x:

应该是

if "\"" in x or "[" in x:    # x must contain a left bracket or double-quote.

或

if x.startswith(('"', '[')): # x must start with a left-braket or double-quote

因为Python将前者解析为

if ("\"") or ("[" in x):

由于 in 运算符的绑定(bind)比 or 更紧密。 (参见Python operator precedence。)

由于任何非空字符串，例如 "\"" 都具有 bool 真值 True，因此 if 语句 的条件为总是正确的，这就是为什么 keyword_cleanup 始终返回 x[1:-1]。

<小时/>

但是，还要注意 Pandas 有 string operators builtin 。使用它们比使用 apply 为系列中的每个项目调用自定义 Python 函数要快得多。

In [136]: s = pd.Series(['"cxxx"', '[asdfasd]', 'asdfasdf', '[asdf]', '"asdf"'])

In [137]: s.str.replace(r'^["[](.*)[]"]$', r'\1')
Out[137]: 
0        cxxx
1     asdfasd
2    asdfasdf
3        asdf
4        asdf
dtype: object

如果您想删除每个字符串两端的所有方括号或双引号，您可以使用

In [144]: s.str.strip('["]')
Out[144]: 
0        cxxx
1     asdfasd
2    asdfasdf
3        asdf
4        asdf
dtype: object

关于python - Pandas 中的字符串清理问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23070302/

上一篇：python - 如何每页显示 1 个字符串？ (PyQt/Python)

下一篇：python - 使用 Python 获取 Windows 8.1 进程列表

python - 如何将返回值(从上一个函数)读取到 pandas、python 中？获取错误消息

python - 使用 Python Pandas 处理双 for 循环

python - python可以检测到它在哪个操作系统下运行吗？

python - Flask 应用程序偶尔会挂起

python - 如何取消对 pandas 中的列进行分类

python - 检测具有混合变量类型的几乎重复的行

python - 将 R 翻译成 Python 管道 - 过滤、选择和排序

python - 如何从两个测量时间绘制连接点的绘图？

python - 获取python中多处理函数的返回值