Python - 创建层次结构文件(在表示为表的树中查找从根到叶的路径)

标签 python python-3.x

给定以下无序制表符分隔文件:

Asia    Srilanka
Srilanka    Colombo
Continents  Europe
India   Mumbai
India   Pune
Continents  Asia
Earth   Continents
Asia    India

目标是生成以下输出(制表符分隔):

Earth   Continents  Asia    India   Mumbai
Earth   Continents  Asia    India   Pune
Earth   Continents  Asia    Srilanka    Colombo
Earth   Continents  Europe

我创建了以下脚本来实现目标:

root={} # this hash will finally contain the ROOT member from which all the nodes emanate
link={} # this is to hold the grouping of immediate children 
for line in f:
    line=line.rstrip('\r\n')
    line=line.strip()
    cols=list(line.split('\t'))
    parent=cols[0]
    child=cols[1]
    if not parent in link:
        root[parent]=1
    if child in root:
        del root[child]
    if not child in link:
        link[child]={}
    if not parent in link:
        link[parent]={}
    link[parent][child]=1

现在我打算使用之前创建的两个字典(root 和 link)打印所需的输出。我不确定如何在 python 中执行此操作。但我知道我们可以在 perl 中编写以下内容来实现结果:

print_links($_) for sort keys %root;

sub print_links
{
  my @path = @_;

  my %children = %{$link{$path[-1]}};
  if (%children)
  {
    print_links(@path, $_) for sort keys %children;
  } 
  else 
  {
    say join "\t", @path;
  }
}

你能帮我在 python 3.x 中实现所需的输出吗?

最佳答案

我在这里看到下一个问题:

  • 从文件中读取关系;
  • 根据关系构建层次结构。
  • 将层次结构写入文件。

假设层次树的高度小于默认recursion limit (在大多数情况下等于 1000),让我们为这个单独的任务定义实用函数。

实用程序

  1. 关系解析可以用

    def parse_relations(lines):
        relations = {}
        splitted_lines = (line.split() for line in lines)
        for parent, child in splitted_lines:
            relations.setdefault(parent, []).append(child)
        return relations
    
  2. 构建层次结构可以用

    • Python >=3.5

      def flatten_hierarchy(relations, parent='Earth'):
          try:
              children = relations[parent]
              for child in children:
                  sub_hierarchy = flatten_hierarchy(relations, child)
                  for element in sub_hierarchy:
                      try:
                          yield (parent, *element)
                      except TypeError:
                          # we've tried to unpack `None` value,
                          # it means that no successors left
                          yield (parent, child)
          except KeyError:
              # we've reached end of hierarchy
              yield None
      
    • Python <3.5:扩展的可迭代拆包 was added with PEP-448 , 但它可以替换为 itertools.chain喜欢

      import itertools
      
      
      def flatten_hierarchy(relations, parent='Earth'):
          try:
              children = relations[parent]
              for child in children:
                  sub_hierarchy = flatten_hierarchy(relations, child)
                  for element in sub_hierarchy:
                      try:
                          yield tuple(itertools.chain([parent], element))
                      except TypeError:
                          # we've tried to unpack `None` value,
                          # it means that no successors left
                          yield (parent, child)
          except KeyError:
              # we've reached end of hierarchy
              yield None
      
  3. 层次结构导出到文件可以用

    def write_hierarchy(hierarchy, path, delimiter='\t'):
        with open(path, mode='w') as file:
            for row in hierarchy:
                file.write(delimiter.join(row) + '\n')
    

用法

假设文件路径是'relations.txt':

with open('relations.txt') as file:
    relations = parse_relations(file)

给我们

>>> relations
{'Asia': ['Srilanka', 'India'],
 'Srilanka': ['Colombo'],
 'Continents': ['Europe', 'Asia'],
 'India': ['Mumbai', 'Pune'],
 'Earth': ['Continents']}

我们的层次结构是

>>> list(flatten_hierarchy(relations))
[('Earth', 'Continents', 'Europe'),
 ('Earth', 'Continents', 'Asia', 'Srilanka', 'Colombo'),
 ('Earth', 'Continents', 'Asia', 'India', 'Mumbai'),
 ('Earth', 'Continents', 'Asia', 'India', 'Pune')]

最后将其导出到名为 'hierarchy.txt' 的文件中:

>>> write_hierarchy(sorted(hierarchy), 'hierarchy.txt')

(我们使用 sorted 来获取您想要的输出文件中的层次结构)

P. S.

如果您不熟悉 Python generators我们可以像这样定义 flatten_hierarchy 函数

  • Python >= 3.5

    def flatten_hierarchy(relations, parent='Earth'):
        try:
            children = relations[parent]
        except KeyError:
            # we've reached end of hierarchy
            return None
        result = []
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            try:
                for element in sub_hierarchy:
                    result.append((parent, *element))
            except TypeError:
                # we've tried to iterate through `None` value,
                # it means that no successors left
                result.append((parent, child))
        return result
    
  • python < 3.5

    import itertools
    
    
    def flatten_hierarchy(relations, parent='Earth'):
        try:
            children = relations[parent]
        except KeyError:
            # we've reached end of hierarchy
            return None
        result = []
        for child in children:
            sub_hierarchy = flatten_hierarchy(relations, child)
            try:
                for element in sub_hierarchy:
                    result.append(tuple(itertools.chain([parent], element)))
            except TypeError:
                # we've tried to iterate through `None` value,
                # it means that no successors left
                result.append((parent, child))
        return result
    

关于Python - 创建层次结构文件(在表示为表的树中查找从根到叶的路径),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44236188/

相关文章:

python - 如何让 Tkinter textarea 接受在 Python 3.X 上删除外部文件?

python - 以二进制方式将字母A写入文件

Python:如何在其他函数中使用一个函数中的命名变量

Python 新手问题 - 未打印正确的值

Python C 模块函数参数引用计数

python - 如何获取每行的百分比并可视化分类数据

python - 如何更改 pip 的默认安装位置

python - 如何将我的训练数据输入到这个神经网络中

python - 响应上传确认,然后在 Sanic 中处理文件

python-3.x - Django 项目中的多个应用程序