我想使用 SPARQL 查询从 DBPedia 端点提取所有主语/谓语/宾语,其中谓词是日期/时间属性。
我尝试从 Dump 解析 DBPedia 的信息框属性并使用 this 过滤语句询问。但是有很多对象的日期/时间格式不正确(例如 200 BC,...)。
如何查询转储文件或 DBPedia 的端点来解析所有有效的基于日期/时间的语句?
最佳答案
前面问题的答案展示了如何检索具有给定数据类型的属性。很容易扩展它以获取使用该属性的语句。该查询绑定(bind) ?p
;现在只需将 ?s ?p ?o
添加到查询中即可。例如:
select ?s ?p ?o where {
?p a owl:DatatypeProperty ;
rdfs:range xsd:date .
?s ?p ?o .
}
limit 100
请注意 DBpedia 3.8 Downloads 的内容页面介绍了“Raw Infobox Properties”和“ONtology Infobox Properties”数据集:
Raw Infobox Properties
Information that has been extracted from Wikipedia infoboxes. Note that this data is in the less clean /property/ namespace. The Ontology Infobox Properties (/ontology/ namespace) should always be preferred over this data.
Ontology Infobox Properties
High-quality data extracted from Infoboxes using the ontology-based extraction. The predicates in this dataset are in the /ontology/ namespace. Used to be called Mapping Based Properties in previous releases.
Note that this data is of much higher quality than the Raw Infobox Properties in the /property/ namespace. For example, there are three different raw Wikipedia infobox properties for the birth date of a person. In the the /ontology/ namespace, they are all mapped onto one relation http://dbpedia.org/ontology/birthDate. It is a strong point of DBpedia to unify these relations.
如果您最终从“原始信息框属性”数据集中获得奇怪的数据值,这并不奇怪。您确实应该使用“本体信息框属性”。
关于rdf - SPARQL 查询从 DBPedia 提取所有日期/时间数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18834426/