基本上根据演讲者来组织内容?
摘录自:罗伯特·路易斯·史蒂文森。 “杰基尔博士和海德先生的奇案。”
输入示例:
But Lanyon's face changed, and he held up a trembling hand. "I wish to see or hear no more of Dr. Jekyll," he said in a loud, unsteady voice. "I am quite done with that person; and I beg that you will spare me any allusion to one whom I regard as dead.
示例输出:
[
“Narrator”: “But Lanyon's face changed, and he held up a trembling hand.”,
“Lanyon”: “I wish to see or hear no more of Dr. Jekyll”,
“Narrator”: “he said in a loud, unsteady voice.”,
“Lanyon”: “I am quite done with that person; and I beg that you will spare me any allusion to one whom I regard as dead.”
]
最佳答案
我还没有听说过能做到这一点的算法。但是有两个众所周知的问题可能有用:命名实体识别(找到所有潜在的说话者)和照应解析(决定谁是“他”或“她”)是在每种情况下)。
您还需要训练一个分类器,针对每个引用的文本 block 来确定它是否是直接语音。您可能需要另一个分类器来决定每个已识别的语音片段以及上下文中每个已识别的说话者,该语音实际上属于该说话者的可能性有多大。
关于machine-learning - 有人会如何创建一种机器学习算法来从书籍/小说中提取说话者?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50660361/