python - 使用 Python 根据引用键从 bibtex 文件中删除特定条目

标签 python regex bibtex

如何使用 python 根据引用键从 bibtex 文件中删除特定条目?我基本上想要一个函数,它接受两个参数(bibtex 文件的路径和引用键)并从文件中删除对应于键的条目。我玩正则表达式但没有成功。我也看了一点 bibtex 解析器,但这似乎有点矫枉过正。在下面的骨架函数中,决定性的部分是content_modified =

def deleteEntry(path, key):
  # get content of bibtex file
  f = open(path, 'r')
  content = f.read()
  f.close() 
  # delete entry from content string
  content_modified = 

  # rewrite file
  f = open(path, 'w')
  f.write(content_modified)
  f.close() 

这是一个示例 bibtex 文件(摘要中有空格):

@article{dai2008thebigfishlittlepond,
    title = {The {Big-Fish-Little-Pond} Effect: What Do We Know and Where Do We Go from Here?},
    volume = {20},
    shorttitle = {The {Big-Fish-Little-Pond} Effect},
    url = {http://dx.doi.org/10.1007/s10648-008-9071-x},
    doi = {10.1007/s10648-008-9071-x},
    abstract = {The big-fish-little-pond effect {(BFLPE)} refers to the theoretical prediction that equally able students will have lower academic
self-concepts in higher-achieving or selective schools or programs than in lower-achieving or less selective schools or programs,
largely due to social comparison based on local norms. While negative consequences of being in a more competitive educational
setting are highlighted by the {BFLPE}, the exact nature of the {BFLPE} has not been closely scrutinized. This article provides
a critique of the {BFLPE} in terms of its conceptualization, methodology, and practical implications. Our main argument is that
of the {BFLPE.}},
    number = {3},
    journal = {Educational Psychology Review},
    author = {Dai, David Yun and Rinn, Anne N.},
    year = {2008},
    keywords = {education, composition by performance, education, peer effect, education, school context, education, social comparison/big-fish{\textendash}little-pond effect},
    pages = {283--317},
    file = {Dai_Rinn_2008_The Big-Fish-Little-Pond Effect.pdf:/Users/jpl2136/Documents/Literatur/Dai_Rinn_2008_The Big-Fish-Little-Pond Effect.pdf:application/pdf}
}

@book{coleman1966equality,
    title = {Equality of Educational Opportunity},
    shorttitle = {Equality of educational opportunity},
    publisher = {{U.S.} Dept. of Health, Education, and Welfare, Office of Education},
    author = {Coleman, James},
    year = {1966},
    keywords = {\_task\_obtain, education, school context, soz. Ungleichheit, education}
}

编辑:这是我想出的解决方案。它不是基于匹配整个 bibtex 条目,而是查找所有开头 @article{dai2008thebigfishlittlepond,,然后通过切片上下文字符串删除相应的条目。

content_keys = [(m.group(1), m.start(0)) for m in re.finditer("@\w{1,20}\{([\w\d-]+),", content)]
idx = [k[0] for k in content_keys].index(key)
content_modified = content[0:content_keys[idx][1]] + content[content_keys[idx + 1][1]:]

最佳答案

正如 Beni Cherniavsky-Paskin 在评论中提到的那样,您将不得不依赖这样一个事实,即您的 BibTex 条目将在行首之后开始和结束(没有任何制表符或空格)。然后你可以这样做:

pattern = re.compile(r"^@\w+\{"+key+r",.*?^\}", re.S | re.M)
content_modified = re.sub(pattern, "", content)

注意两个修饰符。 S 使 . 匹配换行符。 M 使 ^ 匹配字符串的开头。

如果您不能相信这个事实,那么 BibTex 格式根本就不是一种常规语言(因为它允许嵌套 {},必须计算正确的结果。有正则表达式风格,这可能仍然使这项任务成为可能(使用递归或平衡组),但我认为 Python 不支持这些功能。因此,您实际上必须使用 BibTex 解析器(这也会使您的代码更加不稳定,我猜).

关于python - 使用 Python 根据引用键从 bibtex 文件中删除特定条目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13506155/

相关文章:

regex - SAS 正则表达式匹配 1-3 位数字

latex - 如何在 BibTeX 中保留引号

latex - 如何使用 BibTeX 按外观对引文进行排序?

python从文件中提取数据到数据帧

python - numpy.ufunc 大小错误,尝试重新编译。即使使用最新的 pandas 和 numpy 版本

regex - 替换两个字符串之间的多次出现

Python - 在其他两个特定字符之间的字符串中提取文本?

python - 如何在嵌套列表 Python 中连接两个列表?

python - vlc.py 是如何播放视频流的?

当 .bib 文件采用 YAML 格式时,pandoc 不会打印引用