python - 在Python中的DataFrame中映射基于图形/关系的值

我有一个以下格式的输入 DataFrame:

input_data = [[1000, 1002], [1002, 1003], [1004, 1000],[1010,1050],[1060,1002],[1050,1100],[1200,1250],[1300,1200]]
input_df = pd.DataFrame(input_data, columns = ['Value1', 'Value2']) 
print(input_df)

忽略索引以提高可读性，

Value1  Value2
1000    1002
1002    1003
1004    1000
1010    1050
1060    1002
1050    1100
1200    1250
1300    1200

我期望的输出如下所示。我需要映射所有相关值(无论是 value1 -> value2 还是 value2 -> value1)并将它们全部收集到有序索引(从 1 开始)，如下所示:

Index Value
1   1000
1   1002
1   1003
1   1004
1   1060
2   1010
2   1050
2   1100
3   1200
3   1250
3   1300
3   1200

我尝试过什么？我确实尝试循环输入中的行。我能够关联单行中的值是否相关。但我发现当关系跨越多行和多列(Value1 和 Value2)时很难使用这种逻辑

最佳答案

使用convert_matrix.from_pandas_edgelist与 connected_components首先，然后创建映射字典，通过DataFrame.melt reshape ，将每组的值映射为 Series.map ，删除重复项DataFrame.drop_duplicates最后排序:

import networkx as nx

# Create the graph from the dataframe
g = nx.Graph()
g = nx.from_pandas_edgelist(input_df,'Value1','Value2')

connected_components = nx.connected_components(g)

# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
    for node in component:
        node2id[node] = cid

df = input_df.melt()
df['g'] = df['value'].map(node2id)
df = df.drop_duplicates(['value','g']).sort_values(['g','value'])
print (df)
   variable  value  g
0    Value1   1000  0
1    Value1   1002  0
9    Value2   1003  0
2    Value1   1004  0
4    Value1   1060  0
3    Value1   1010  1
5    Value1   1050  1
13   Value2   1100  1
6    Value1   1200  2
14   Value2   1250  2
7    Value1   1300  2

关于python - 在Python中的DataFrame中映射基于图形/关系的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65212488/

python - 在Python中的DataFrame中映射基于图形/关系的值

上一篇：c# - 将项目更新到实体核心 5 后出现 Microsoft.EntityFrameworkCore.Query.IParameterValues 错误

下一篇：flutter - 为 flutter 和 dart 定义 linter 规则