我有一个包含数百个条目的 json 文件,例如:
{
"url":"http://example.com/10618/",
"metatag.eprints.publication":"Journal of Corporate Real Estate",
"metatag.eprints.title":"Corporate Real Estate Strategy",
"metatag.eprints.citation":"Adair, P, McGrogan, WS, and Webb, JR (2006) Corporate Real Estate Strategy. Journal of Corporate Real Estate"}
{
"url":"http://example.com/23552/",
"metatag.eprints.publication":"European Journal of Cardio-Thoracic Surgery",
"metatag.eprints.title":"Long-term survival from coronary endarterectomies in coronary artery disease",
"metatag.eprints.citation":"Aaron, P, Jones, K, Pallin, C, and Nash, R (2012) Long-term survival from coronary endarterectomies in coronary artery disease. European Journal of Cardio-Thoracic Surgery"}
有人可以帮助编写一个 jq 或 python 脚本吗?对于每个 block ,该脚本都会更改“metatag.eprints.引用”,以便删除日期之后的所有文本?
所以上面的 block 将变成:
{
"url":"http://example.com/10618/",
"metatag.eprints.publication":"Journal of Corporate Real Estate",
"metatag.eprints.title":"Corporate Real Estate Strategy",
"metatag.eprints.citation":"Adair, P, McGrogan, WS, and Webb, JR (2006)"}
{
"url":"http://example.com/23552/",
"metatag.eprints.publication":"European Journal of Cardio-Thoracic Surgery",
"metatag.eprints.title":"Long-term survival from coronary endarterectomies in coronary artery disease",
"metatag.eprints.citation":"Aaron, P, Jones, K, Pallin, C, and Nash, R (2012)"}
最佳答案
jq '.["metatag.eprints.引用"] |= match(".*?\\\)").string//.'
需要 jq 1.5。其作用是将 metatag.eprints.itation
的值设置为将自身与正则表达式 .*?\)
匹配的结果,它将匹配第一次关闭之前的所有内容插入语。如果出于某种原因没有右括号,我们将使用替代运算符 //
将值设置回原来的值。
关于python - jq 或 python 脚本删除 json 字段中日期之后的文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33107524/