我有从文本中提取的实体列表。
例如,这是我的文字
"text": "Anarchism is an anti-authoritarian political and social philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions."
这里是从文本中提取的实体。(对于每一对,第一个是实体,第二个是文本中提到的实体。
"anchored_et": [["Anti-authoritarianism", "anti-authoritarian"], ["Political philosophy", "political"], ["Social philosophy", "social philosophy"], ["Hierarchy", "hierarchies"], ["Workers' self-management", "self-managed"], ["Self-governance", "self-governed"], ["cooperative", "cooperative"]]
此外,我还有三元组列表,其主题和对象采用 wiki 数据 QID 格式。
所以我需要首先将提取的实体转换为其 QID,然后找到其主题所在的三元组,找到这些三元组后,我需要将对象 QID 转换为其实体。
所以我需要在 python 中将 wiki 数据 QID 转换为实体,反之亦然。
我的问题是我该如何做到这一点。
最佳答案
这是我为我编写的两个函数。
我用过SPARQLWrapper来自 pypi。
from SPARQLWrapper import SPARQLWrapper
import requests
def wikidata_id_to_enwiki_title(Qid):
try:
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setReturnFormat('json')
sparql.setQuery('SELECT DISTINCT * WHERE { wd:'+Qid+' rdfs:label ?label . FILTER (langMatches( lang(?label), "EN" ) ) }') # the previous query as a literal string
data=sparql.query().convert()
results=data["results"]["bindings"]
results=[res["label"]["value"] for res in results]
return results
except:
return [ ]
def enwiki_title_to_wikidata_id(title: str) -> str:
try:
protocol = 'https'
base_url = 'en.wikipedia.org/w/api.php'
params = f'action=query&prop=pageprops&format=json&titles={title}'
url = f'{protocol}://{base_url}?{params}'
response = requests.get(url)
json = response.json()
for pages in json['query']['pages'].values():
wikidata_id = pages['pageprops']['wikibase_item']
return wikidata_id
except:
return None
关于sparql - 如何在Python中将wiki数据QID转换为实体,反之亦然,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72704205/