我正在尝试遍历层次结构数据框并将每条可能的路线记录到另一个数据框中。这些路线可以具有可变深度。
原始数据框 (df)。最高列意味着父列中的值不是任何子列:
最终目标数据框:
这是我目前拥有的
def search(parent):
for i in range(df.shape[0]):
if(df.iloc[i,0] == parent):
search(df.iloc[i,1])
for i in range(df.shape[0]):
if(df.iloc[i,2] == 1):
search(df.iloc[i,0])
我可以遍历层次结构,但我不知道如何以我想要的格式保存它。
最佳答案
您可以使用 networkx
解决问题。请注意,如果您使用 networkx
,则不需要 highest
列。查找所有路径的主要功能是all_simple_paths
# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx
# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
# create_using=nx.DiGraph)
# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
G.add_edge(source, target)
# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]
# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]
# Find all paths
paths = []
for root in roots:
for leaf in leaves:
for path in nx.all_simple_paths(G, root, leaf):
paths.append(path)
# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)
输出:
>>> out
level 3 level 2 level 1 level 0
0 a b c
1 a b d e
关于python - 如何使用递归记录父子层次结构中的所有路由?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69342255/