我有一个包含“键”和“段落”的列表。每个“键”都与一个“段落”相关联。
我的目标是将每个段落分成单独的句子,每个句子都以段落形式分配给它们最初所属的“键”。例如:
(['2925729', 'Patrick came outside and greeted us promptly.'], ['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.'], ['2925728', 'Patrick sucks. He farted politely, but it seemed routine.'])
现在我已经能够编写代码将句子分成段落,并获取每个句子对字典的命中数。我现在想要将一个 ID 与每个问题相关联。
这是处理没有任何“键”的句子的代码。为了节省空间,我省略了步骤 1 和 2:
Dictionary = ['book', 'should have', 'open']
####Step3#####
#Create Blank list to append final output
final_out = []
##Find Matches
for sent in sentences:
for sent in sentences:
final_out.append((sent, sum(sent.count(col) for col in dictionary)))
#####Spit out final distinct output
##Output in dictionary structure
final_out = dict(sorted(set(final_out)))
####Get sentences and rank by max first
import operator
sorted_final_out = sorted(final_out.iteritems(),key = operator.itemgetter(1), reverse = True)
输出是: (['约翰尼吃了羚羊', 80], ['莎莉有一个 friend ',20]) 等等。然后我选择顶部的 X b 震级。我现在想要实现的目标是这样的:(['12222','johny ate the antelope', 80], [22332,'sally has afriend',20])。所以我基本上想确保所有句子在解析时都分配给一个“键”。这很复杂抱歉。这也是为什么 John 的早期解决方案适用于更简单的情况。
最佳答案
from itertools import chain
list(chain(*[[[y[0],z] for z in y[1].split('. ')] for y in x]))
产生
[['2925729', 'Patrick came outside and greeted us promptly.'],
['2925729', 'Patrick did not shake our hands nor ask our names'],
['2925729', 'He greeted us promptly and politely, but it seemed routine.'],
['2925728', 'Patrick sucks'],
['2925728', 'He farted politely, but it seemed routine.']]
list(chain(*...))
展平由 [[[y[0],z] for z in y[1].split(' 生成的嵌套列表.')] for y in x]
.
如果您想“就地”更改列表,您可以使用
xl = list(x) # you gave us a tuple
for i,y in enumerate(xl):
xx = xl[i]
xx = [[xx[0],y] for y in xx[1].split('. ')]
xl[i:i+1] = xx
当数据集非常大时,我不确定哪个会更快或更好。
关于python - 将列表转换为子列表,同时保留 "key",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20861203/