python - 在Python中从列表的列表创建列表

标签 python nlp text-mining

我正在创建一个树状结构,其中每个叶节点都有 5 个文档。要获取父节点的文档,子节点的所有文档都将分配给它。

例如A是父节点,B、C是子节点,每个节点都有5个文档。因此,A 的文档将为 5+5=10。同样,A 的父节点将获取 A 的 10 个文档 + A 的兄弟节点的文档编号。我们将重复此操作,直到到达根节点。

我想将 A 的文档存储为大小为 10 的列表,同样将 A 的父级存储为其子级的文档总数。但它将其存储为大小为 2 的列表,并且每个列表下各有 5 个文档。 A 的父级也将 A 的文档存储为 3 的列表,而不是我想要的 3*5=15

如何将每个节点上的文档存储为文档总数而不是列表列表? 下面是我正在使用的代码。

from anytree import Node, RenderTree
import pandas as pd
import numpy as np

class Node(Node):
    Node.documents = None
    Node.vector = None

### Creating tree by giving documnets to leaf ###
### Tree Creation ###    
# L1    
Finance = Node("Finance")
# L2
Credit_and_Lending = Node("Credit and Lending", parent=Finance)
# L3
Credit_Cards = Node("Credit Cards", parent=Credit_and_Lending)

Loans = Node("Loans", parent=Credit_and_Lending)

# L4
Low_Interest_and_No_Interest_Credit_Cards = Node("Low Interest & No Interest Credit Cards", parent=Credit_Cards, documents=[(fvc.loc[(fvc['keyword']=='low interest & no interest credit cards') & (fvc['organic_rank']==1)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='low interest & no interest credit cards') & (fvc['organic_rank']==2)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='low interest & no interest credit cards') & (fvc['organic_rank']==3)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='low interest & no interest credit cards') & (fvc['organic_rank']==4)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='low interest & no interest credit cards') & (fvc['organic_rank']==5)])['vocab'].tolist()[0]])

Rewards_Cards = Node("Rewards Cards", parent=Credit_Cards, documents=[(fvc.loc[(fvc['keyword']=='rewards cards') & (fvc['organic_rank']==1)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='rewards cards') & (fvc['organic_rank']==2)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='rewards cards') & (fvc['organic_rank']==3)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='rewards cards') & (fvc['organic_rank']==4)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='rewards cards') & (fvc['organic_rank']==5)])['vocab'].tolist()[0]])

Student_Credit_Cards = Node("Student Credit Cards", parent=Credit_Cards, documents=[(fvc.loc[(fvc['keyword']=='student credit cards') & (fvc['organic_rank']==1)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='student credit cards') & (fvc['organic_rank']==2)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='student credit cards') & (fvc['organic_rank']==3)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='student credit cards') & (fvc['organic_rank']==4)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='student credit cards') & (fvc['organic_rank']==5)])['vocab'].tolist()[0]])

Auto_Financing = Node("Auto Financing", parent=Loans, documents=[(fvc.loc[(fvc['keyword']=='auto financing') & (fvc['organic_rank']==1)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='auto financing') & (fvc['organic_rank']==2)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='auto financing') & (fvc['organic_rank']==3)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='auto financing') & (fvc['organic_rank']==4)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='auto financing') & (fvc['organic_rank']==5)])['vocab'].tolist()[0]])
Commercial_Lending = Node("Commercial Lending", parent=Loans, documents=[(fvc.loc[(fvc['keyword']=='commercial lending') & (fvc['organic_rank']==1)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='commercial lending') & (fvc['organic_rank']==2)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='commercial lending') & (fvc['organic_rank']==3)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='commercial lending') & (fvc['organic_rank']==4)])['vocab'].tolist()[0]
                            , (fvc.loc[(fvc['keyword']=='commercial lending') & (fvc['organic_rank']==5)])['vocab'].tolist()[0]])

##### Visualizing the created tree #####
for pre, fill, node in RenderTree(Finance):
    print("%s%s" % (pre, node.name))

##### Getting documents for parent nodes #####
def get_documents(node):    
    if node.documents is not None:
        return node.documents
    else:
        child_nodes = node.children
        lis = []
        for child in child_nodes:
            child_docs = get_documents(child)
            lis.append(child_docs)
        node.documents = lis
        return lis


get_documents(Finance)

最佳答案

您可以使用以下语法:

lis = lis + child_docs

而不是

 lis.append(child_docs)

关于python - 在Python中从列表的列表创建列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49377971/

相关文章:

java - 如何修复 lingpipe java 错误

elasticsearch - 如何知道哪些关键字在elasticsearch中匹配

python - 使用交叉验证评估逻辑回归

python - 如何将 stdout 和 stderr 重定向到管道但保持有序

python - 在 sqlalchemy 中按 row_number 过滤

java - 斯坦福自然语言处理 : Keeping punctuation tokens?

nlp - 让menhir找到所有替代方案?

Java实现大型稀疏矩阵的奇异值分解

nlp - 爬网

python - 如何为 Jinja2 编写 "joiner"扩展?