python - 将元组字典映射到 DataFrame 的多列

我有一个带有残基插入代码的 PDB 数据帧。简化的例子。

d = {'ATOM' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 
    'residue_number' : [2, 2, 2, 3, 3, 3, 3, 3, 3, 5, 5, 5],
    'insertion' : ['', '', '', '', '', '', 'A', 'A', 'A', '', '', '']} 

df = pd.DataFrame(data = d)

数据框:

    ATOM  residue_number insertion
0      1               2          
1      2               2          
2      3               2          
3      4               3          
4      5               3          
5      6               3          
6      7               3     A
7      8               3     A
8      9               3     A
9     10               5          
10    11               5          
11    12               5

我需要根据不同的编号和插入方案对残基重新编号。重新编号脚本的输出可以格式化为元组字典，例如

my_dict = {(2,): 1, (3,): 2, (3, 'A') : 3, (5, ) : (4, 'A') }

是否可以将这个元组字典映射到两列 ['ATOM']['insertion'] 上？所需的输出是:

    ATOM  residue_number insertion
0      1               1          
1      2               1          
2      3               1          
3      4               2          
4      5               2          
5      6               2          
6      7               3         
7      8               3         
8      9               3         
9     10               4      A          
10    11               4      A          
11    12               4      A

几天来我一直在搜索并对此感到困惑，我尝试过映射和多重索引，但似乎无法找到一种方法来映射跨多列的元组字典。我觉得我在某种程度上想错了。感谢您的任何建议!

最佳答案

在这种情况下，我认为您需要定义一个函数，该函数将您的旧 residue_number 作为输入。和 insertion并输出新的。为此，我将直接从 df 工作，因此，为了避免额外的编码，我将重新定义您的 my_dict来自 (2,)到此 (2,'')
这是代码:

import pandas as pd
d = {'ATOM' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 
    'residue_number' : [2, 2, 2, 3, 3, 3, 3, 3, 3, 5, 5, 5],
    'insertion' : ['', '', '', '', '', '', 'A', 'A', 'A', '', '', '']} 

df = pd.DataFrame(data = d)

# Our new dict with keys and values as tuples
my_new_dict = {(2,''): (1,''), (3,''): (2,''), (3,'A'): (3,''), (5,''): (4,'A') }

# We need a function that maps a tuple (residue_number, insertion) into your new_residue_number and new_insertion values
def new_residue_number(residue_number, insertion, my_new_dict):
    # keys are tuples
    key = (residue_number, insertion)
    # Return new residue_number and insertion values
    return my_new_dict[key]

# Example to see how this works
print(new_residue_number(2, '', my_new_dict)) # Output (1,'')
print(new_residue_number(5, '', my_new_dict)) # Output (4, 'A')
print(new_residue_number(3, 'A', my_new_dict)) # Output (3,'')

# Now we apply this to our df and save it in the same df in two new columns
df[['new_residue_number','new_insertion']] = df.apply(lambda row: pd.Series(new_residue_number(row['residue_number'], row['insertion'], my_new_dict)), axis=1)

我希望这可以解决您的问题!

关于python - 将元组字典映射到 DataFrame 的多列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59804249/

python - 将元组字典映射到 DataFrame 的多列

上一篇：javascript - 启动动画 onclick

下一篇：ios - UIProgressView如何使其循环