Python - 如何提取包含引用标记的句子？

标签 python regex text-segmentation citations

text = "Trondheim is a small city with a university and 140000 inhabitants. Its central bus systems has 42 bus lines, serving 590 stations, with 1900 (departures per) day in average. T h a t gives approximately 60000 scheduled bus station passings per day, which is somehow represented in the route data base. The starting point is to automate the function (Garry Weber, 2005) of a route information agent."
print re.findall(r"([^.]*?\(.+ [0-9]+\)[^.]*\.)",text)

我正在使用上面的代码来提取其中包含引用的句子。正如您所看到的，最后一句包含引文(Garry Weber，2005)。

但是我得到了这个结果:

[' Its central bus systems has 42 bus lines, serving 590 stations, with 1900 (departures per) day in average. T h a t gives approximately 60000 scheduled bus station passings per day, which is somehow represented in the route data base. The starting point is to automate the function (Garry Weber, 2005) of a route information agent.']

结果应该是仅包含引用的句子，如下所示:
出发点是实现路线信息代理功能的自动化(Garry Weber，2005)。

我猜问题是由括号内的文本引起的，正如您在它包含的第二行中看到的那样(出发次数)，我的代码有什么解决方案吗？

最佳答案

我的尝试。 Live demo .

\b[^.]+\([^()]+\b(\d{2}|\d{4})\s*\)[^.]*\.

它准确地捕捉了句子，并且比您的年份更具体。

关于Python - 如何提取包含引用标记的句子？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45661257/

上一篇：python - 如果列包含 Pandas 中的任何字符串值，则从数据框中删除值

下一篇：python - nmap 和 print(nm.csv()) 需要帮助打印到 csv.file

mysql - 如何使用 spring-data-jpa 对具有持久关系的实体执行 native 查询

javascript - 具有精确模式的单词的正则表达式

python - 与正则表达式匹配的句子

javascript - 将段落拆分成句子

python - web.py:将初始化/全局变量传递给处理程序类？

python - 如何使用 Python 处理内存不足

python - 哪个语法规则匹配 def foo(a, *, b=10) 复合语句？

Javascript 正则表达式从错误的起点匹配

python - 从文件中解析数据