python - 如何通过与另一列字符串进行比较来对具有真值和假值的字符串列进行分类

所以我有一列被列为“化合物”的字符串

组成(列标题)

ZrMo3

Gd(CuS)3

Ba2DyInTe5

我还有另一列，其中包含元素周期表中的字符串金属元素，我将该列称为“金属”

元素(列标题)

李

是

钠

目标是检查“化合物”中的每个字符串与“金属”中列出的每个字符串，如果存在任何来自金属的字符串，则将其分类为 true。我有什么想法可以编码吗？

示例:(如果“金属”包含 Zr、Ag 和 Te)

ZrMo3 正确

Gd(CuS)3 假

Ba2DyInTe5 正确

我最近尝试使用下面的代码，但最终得到的都是 false

asd = subset['composition'].isin(metals['Elements'])
    
print(asd)

也尝试了这段代码，结果也都是错误的

subset['Boolean'] = subset.apply(lambda x: True if any(word in x.composition for word in metals) else False, axis=1)

最佳答案

假设您使用的是 pandas，您可以使用 list comprehension在 lambda 内部，因为您本质上需要迭代 elements 列表

中的所有元素

import pandas as pd

elements = ['Li', 'Be', 'Na', 'Te']
compounds = ['ZrMo3', 'Gd(CuS)3', 'Ba2DyInTe5']

df = pd.DataFrame(compounds, columns=['compounds'])
print(df)

输出

  compounds
0       ZrMo3
1    Gd(CuS)3
2  Ba2DyInTe5

df['boolean'] = df.compounds.apply(lambda x: any([True if el in x else False for el in elements]))
print(df)

输出

    compounds  boolean
0       ZrMo3    False
1    Gd(CuS)3    False
2  Ba2DyInTe5     True

如果您不使用 pandas，您可以使用 map 函数将 lambda 函数应用于列表

out = list(
    map(
        lambda x: any([True if el in x else False for el in elements]), compounds)
)
print(out)

输出

[False, False, True]

这里将是一个更复杂的版本，它还解决了 @Ezon 提到的基于正则表达式匹配模块re的潜在错误。因为这种方法本质上不仅是在元素上循环以与单个复合字符串进行比较，而且是在复合的每个组成部分上循环，因此我创建了两个辅助函数以使其更具可读性。

import re
import pandas as pd


def split_compounds(c):
    
    # remove all non-alphabet elements
    c_split = re.sub(r"[^a-zA-Z]", "", c)
    # split string at capital letters
    c_split = '-'.join(re.findall('[A-Z][^A-Z]*', c_split))
    return c_split

def compare_compound(compound, element):
    
    # split compound into list
    compound_list = compound.split('-')
    
    return any([element == c for c in compound_list])
    
    
# build sample data
compounds = ['SiO2', 'Ba2DyInTe5', 'ZrMo3', 'Gd(CuS)3']
elements = ['Li', 'Be', 'Na', 'Te', 'S']
df = pd.DataFrame(compounds, columns=['compounds'])

# split compounds into elements
df['compounds_elements'] = [split_compounds(x) for x in compounds]

print(df)

输出

    compounds compounds_elements
0        SiO2               Si-O
1  Ba2DyInTe5        Ba-Dy-In-Te
2       ZrMo3              Zr-Mo
3    Gd(CuS)3            Gd-Cu-S


# check if any item from 'elements' is in the compounds
df['boolean'] = df.compounds_elements.apply(
    lambda x: any([True if compare_compound(x, el) else False for el in elements])
)

print(df)

输出

    compounds compounds_elements  boolean
0        SiO2               Si-O    False
1  Ba2DyInTe5        Ba-Dy-In-Te     True
2       ZrMo3              Zr-Mo    False
3    Gd(CuS)3            Gd-Cu-S     True

关于python - 如何通过与另一列字符串进行比较来对具有真值和假值的字符串列进行分类，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/74527231/

python - 如何通过与另一列字符串进行比较来对具有真值和假值的字符串列进行分类

上一篇：android - Android中如何获取当前的webview对象？

下一篇：c++ - 如何使用 Vector Class Library 进行 AVX 矢量化以及 openmp #pragma omp parallel 进行缩减？