python - 如何与通配符合并？ - Pandas

我有两个要合并的数据框。右侧数据帧的连接列可能包含通配符值(例如:“ALL”)，该值应与左侧数据帧的连接列中的每个值匹配。

考虑以下最小示例:

entities = pandas.DataFrame.from_dict([
    { 'name' : 'Boson', 'type' : 'Material' },
    { 'name' : 'Atman', 'type' : 'Ideal' },
])

recommendations = pandas.DataFrame.from_dict([
    { 'action' : 'recognize', 'entity_type' : 'ALL'},
    { 'action' : 'disdain', 'entity_type' : 'Material'},
    { 'action' : 'worship', 'entity_type' : 'Ideal'},
])

recommendations 可以解释为一组截至 “认识所有实体，无论其类型如何，蔑视物质实体，崇拜理想实体”)。我现在想要一个包含所有实体及其建议操作的数据帧。因此，在此示例中，生成的数据框应如下所示

    name recommendation      type
0  Boson      recognize  Material
1  Boson        disdain  Material
2  Atman      recognize     Ideal
3  Atman        worship     Ideal

有没有一种 Pandas 方式可以做到这一点？

我知道如何通过创建一个包含实体的笛卡尔积和建议的数据框，然后根据条件削减它来实现这一目标。

我还可以想到一个解决方案，其中我获取实体中存在的一系列所有类型，并为推荐中的每一行的每种类型创建一行 带有通配符类型。

但在我真正的问题中，我实际上有多个列，我想用通配符值连接它们。所以一个聪明而高效的 pandaic 方式会对我有很大帮助。

最佳答案

一种可能的解决方案是我从存在的所有其他元素中替换通配符，然后合并它们，即

数据:

edf = pd.DataFrame.from_dict([
    { 'name' : 'Boson', 'type' : 'Material' },
    { 'name' : 'Atman', 'type' : 'Ideal' },
])

rdf = pd.DataFrame.from_dict([
    { 'action' : 'recognize', 'entity_type' : 'ALL'},
    { 'action' : 'disdain', 'entity_type' : 'Material'},
    { 'action' : 'worship', 'entity_type' : 'Ideal'},
])

预处理:

mask = rdf['entity_type']=='ALL'

# Join all the elements from `edf['type']` with `;` since you might have `,`s in types and we need to use set to get rid of duplicates (Thank you @John  )
all_ =  ';'.join(set(edf['type'])) # all_ : Material,Ideal

# Replace all by newly obatined string 
rdf['entity_type'] = np.where(mask,all_,rdf['entity_type'])

rdf
      action     entity_type
0  recognize  Material;Ideal
1    disdain        Material
2    worship           Ideal

# Split and stack so we can make `entity_type` one dimensional
rdf = rdf.set_index('action')['entity_type'].str.split(';',expand=True)\
        .stack().reset_index('action').rename(columns={0:'type'})

rdf
          action     type
 0  recognize    Material
 1  recognize       Ideal
 0    disdain    Material
 0    worship       Ideal

合并:

ndf = edf.merge(rdf,on='type').rename(columns={'action':'recommendation'})

ndf

   name      type recommendation
0  Boson  Material      recognize
1  Boson  Material        disdain
2  Atman     Ideal      recognize
3  Atman     Ideal        worship

在不同数据帧上运行的示例:

edf = pd.DataFrame.from_dict([
    { 'name' : 'Boson', 'type' : 'Material' },
    { 'name' : 'Atman', 'type' : 'Ideal' },
    { 'name' : 'Chaos', 'type' : 'Void, but emphasized' },
    { 'name' : 'Tohuwabohu', 'type' : 'Void' },
]) 

rdf = pd.DataFrame.from_dict([
    { 'action' : 'recognize', 'entity_type' : 'ALL'},
    { 'action' : 'disdain', 'entity_type' : 'Material'},
    { 'action' : 'worship', 'entity_type' : 'Ideal'},
    { 'action' : 'drink', 'entity_type' : 'ALL'}
])

然后:

mask = rdf['entity_type']=='ALL'
all_ =  ';'.join(set(edf['type']))
rdf['entity_type'] = np.where(mask,all_,rdf['entity_type'])

rdf = rdf.set_index('action')['entity_type'].str.split(';',expand=True)\
        .stack().reset_index('action').rename(columns={0:'type'})
ndf = edf.merge(rdf,on='type').rename(columns={'action':'recommendation'})

ndf

         name                  type recommendation
0       Boson              Material      recognize
1       Boson              Material        disdain
2       Boson              Material          drink
3       Atman                 Ideal      recognize
4       Atman                 Ideal        worship
5       Atman                 Ideal          drink
6       Chaos  Void, but emphasized      recognize
7       Chaos  Void, but emphasized          drink
8  Tohuwabohu                  Void      recognize
9  Tohuwabohu                  Void          drink

与笛卡尔积相比，这种方法速度快且消耗的内存更少。希望它有帮助:)

关于python - 如何与通配符合并？ - Pandas ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47472207/

python - 如何与通配符合并？ - Pandas

上一篇：python - 从数据框中创建年份列表

下一篇：python - 创建测试数据库时出错 : Django unittest