我有一个带有残基插入代码的 PDB 数据帧。简化的例子。
d = {'ATOM' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'residue_number' : [2, 2, 2, 3, 3, 3, 3, 3, 3, 5, 5, 5],
'insertion' : ['', '', '', '', '', '', 'A', 'A', 'A', '', '', '']}
df = pd.DataFrame(data = d)
数据框:
ATOM residue_number insertion
0 1 2
1 2 2
2 3 2
3 4 3
4 5 3
5 6 3
6 7 3 A
7 8 3 A
8 9 3 A
9 10 5
10 11 5
11 12 5
我需要根据不同的编号和插入方案对残基重新编号。重新编号脚本的输出可以格式化为元组字典,例如
my_dict = {(2,): 1, (3,): 2, (3, 'A') : 3, (5, ) : (4, 'A') }
是否可以将这个元组字典映射到两列 ['ATOM']['insertion'] 上?所需的输出是:
ATOM residue_number insertion
0 1 1
1 2 1
2 3 1
3 4 2
4 5 2
5 6 2
6 7 3
7 8 3
8 9 3
9 10 4 A
10 11 4 A
11 12 4 A
几天来我一直在搜索并对此感到困惑,我尝试过映射和多重索引,但似乎无法找到一种方法来映射跨多列的元组字典。我觉得我在某种程度上想错了。感谢您的任何建议!
最佳答案
在这种情况下,我认为您需要定义一个函数,该函数将您的旧 residue_number
作为输入。和 insertion
并输出新的。为此,我将直接从 df 工作,因此,为了避免额外的编码,我将重新定义您的 my_dict
来自 (2,)
到此 (2,'')
这是代码:
import pandas as pd
d = {'ATOM' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'residue_number' : [2, 2, 2, 3, 3, 3, 3, 3, 3, 5, 5, 5],
'insertion' : ['', '', '', '', '', '', 'A', 'A', 'A', '', '', '']}
df = pd.DataFrame(data = d)
# Our new dict with keys and values as tuples
my_new_dict = {(2,''): (1,''), (3,''): (2,''), (3,'A'): (3,''), (5,''): (4,'A') }
# We need a function that maps a tuple (residue_number, insertion) into your new_residue_number and new_insertion values
def new_residue_number(residue_number, insertion, my_new_dict):
# keys are tuples
key = (residue_number, insertion)
# Return new residue_number and insertion values
return my_new_dict[key]
# Example to see how this works
print(new_residue_number(2, '', my_new_dict)) # Output (1,'')
print(new_residue_number(5, '', my_new_dict)) # Output (4, 'A')
print(new_residue_number(3, 'A', my_new_dict)) # Output (3,'')
# Now we apply this to our df and save it in the same df in two new columns
df[['new_residue_number','new_insertion']] = df.apply(lambda row: pd.Series(new_residue_number(row['residue_number'], row['insertion'], my_new_dict)), axis=1)
我希望这可以解决您的问题!
关于python - 将元组字典映射到 DataFrame 的多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59804249/