XQuery/DBpedia 与 SPARQL - 足球队
DBpedia 是一个将维基百科内容转换为 RDF 的项目,以便将其链接到其他数据集,以丰富语义网。它提供了 w:SPARQL 端点 用于查询这个数据库。
这个应用程序使用 DBpedia 创建一个 kml 文件,显示所选英国足球队的成员的出生地。数据质量受到一些因素的限制
- DBpedia 所基于的维基百科摘录的年代
- 维基百科中球员个人页面的存在与否
- 维基百科信息框中属性标签的一致性
declare variable $query := " PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>. OPTIONAL {?player p:cityofbirth ?city}. OPTIONAL {?player p:dateOfBirth ?dob}. OPTIONAL {?player p:clubnumber ?no}. OPTIONAL {?player p:position ?position}. OPTIONAL {?player p:image ?image}. OPTIONAL { { ?city geo:long ?long. } UNION { ?city p:redirect ?city2. ?city2 geo:long ?long. }. }. OPTIONAL { { ?city geo:lat ?lat.} UNION { ?city p:redirect ?city3. ?city3 geo:lat ?lat. }. }. } ";
这个查询由于需要处理城市名称的可能重定向而变得复杂 - (可以改进吗 - 这是一个通用的问题?)。为了获得更完整的数据,查询还应该处理用于出生地和出生日期的多个同义词
DBpedia 的更改导致了基于数据模型和词汇的查询的短暂生命周期。截至 2011 年 1 月,该查询正在更新。目前,要获取阿森纳当前球员的出生地和出生日期,以下查询似乎有效。
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX p: <http://dbpedia.org/property/> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT * WHERE { <http://dbpedia.org/resource/Arsenal_F.C.> p:name ?player. ?player dbpedia-owl:birthPlace ?city; dbpedia-owl:birthDate ?dob. ?city geo:long ?long; geo:lat ?lat. }
但是,这会产生多个地理编码位置,可以假设第一个位置是最具体的(但不能在 SPARQL 中过滤吗?)。
原型 SPARQL 查询针对阿森纳足球俱乐部。这个球队名称需要被提供的球队名称替换,然后查询进行 URI 编码,并传递给 DBpedia SPARQL 端点。
let $club := request:get-parameter ("club","Arsenal_F.C.") let $queryx := replace($query,"Arsenal_F.C.",$club)
旁注:最初,该查询是用一个通用的占位符 ($team) 而不是一个原型值(阿森纳足球俱乐部)编写的。原型语法的好处是提供了可执行的 SPARQL 查询,无需编辑,更具表现力且更不易出错 - $team 中的 $ 需要在替换表达式中转义,因为第二个参数是一个正则表达式。
这个查询使用 Virtuoso 引擎提供的 SPARQL 端点。结果的格式定义为 XML,即 SPARQL 查询结果格式。一个函数清理了接口
declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return doc($sparql) };
结果采用 SPARQL 查询结果 XML 格式。将其转换为具有命名元素的元组,以便进行后续处理会更加方便。
declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding in $result/r:binding return if ($binding/r:uri) then element {$binding/@name} { attribute type {"uri"} , string($binding/r:uri) } else element {$binding/@name} { attribute type {$binding/r:literal/@datatype}, string($binding/r:literal) } } </tuple> };
let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result)
由于我们正在生成 kml,因此需要设置媒体类型和文件名,并创建一个 Document 节点 - 在脚本的适当位置
declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none"; let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$team,'.kml;')) return <Document> <name>Birthplaces of players in the {$team} squad</name> <Style id="player"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href> </Icon> </IconStyle> </Style> ..... </Document>
该图标是 GoogleEarth 的一个库存足球运动员图标。
由于某些属性有多个值,例如 cityofbirth 通常表示为地址路径,因此每个球员都有多个元组。这些需要分组和压缩。这里我们使用 XQuery 语法,它使用 distinct-values 获取一组球员姓名,然后使用姓名作为键访问一组行。这个脚本采用了简单的方法,只使用包含纬度值的前一个元组,等待对多个 cityofbirth 值更好的解决方案。
我们只对出生地点已进行地理编码的球员感兴趣,因此我们过滤包含纬度元素的元组
{ for $playername in distinct-values($tuples[lat]/player) let $player := $tuples[player=$playername][lat][1]
wikiPedia 数据在可用于 kml 之前需要进行一些清理。一个通用的清理函数对 URI 编码的字符进行解码,删除一些不相关的文本,并将下划线替换为空格。(这个 hack 需要改进)
declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"Football__positions#","") let $text := replace($text,"#",",") let $text := replace($text,"_"," ") return $text };
let $name := local:clean($player/player) let $city :=local:clean($player/city) let $position := local:clean($player/position)
出生日期采用 xs:date 格式,但为可选值。如果该值为有效日期,则使用 eXist 函数将其转换为更易读的格式
let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else ""
职位号码也是如此,它应该是 xs:integer。由于有时一个球队中的几个球员来自同一个地方,因此映射的职位会稍微抖动。
let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no) ,"] ") else ""
纬度和经度应该是 xs:decimal。由于有时一个球队中的几个球员来自同一个地方,因此映射的职位会稍微抖动。
let $lat :=xs:decimal($player/lat) + (math:random() - 0.5)* 0.01 let $long :=xs:decimal($player/long) + (math:random() - 0.5)* 0.01
地标描述的正文将包含 XHTML 标记,以在有图像时显示图像,并链接到 DBpedia 页面。XML 需要序列化为字符串,以便 GoogleMap 在弹出窗口中渲染描述
let $description := <div> {concat ($position, $no, " born ", $dob, " in ", $city)} <div> <a href="{$player/player}">DBpedia</a> <a href="http://images.google.co.uk/images?q={$name}">Google Images</a> </div> {if ($player/image !="") then <div><img src="{$player/image}" height="200"/> </div> else () } </div> order by $name return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#player</styleUrl> </Placemark> }
阿森纳球员地图
注意,q 参数是 URI 编码的。
(: generate a sparql query on the dbpedia server This takes a team name and generates a kml file showing the birth place of the players :) declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare variable $query := " PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?player p:currentclub <http://dbpedia.org/resource/Arsenal_F.C.>. OPTIONAL {?player p:cityofbirth ?city}. OPTIONAL {?player p:birth ?dob}. OPTIONAL {?player p:clubnumber ?no}. OPTIONAL {?player p:position ?position}. OPTIONAL {?player p:image ?image}. OPTIONAL { { ?city geo:long ?long. } UNION { ?city p:redirect ?city2. ?city2 geo:long ?long. }. }. OPTIONAL { { ?city geo:lat ?lat.} UNION { ?city p:redirect ?city3. ?city3 geo:lat ?lat. }. }. } "; declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=", encode-for-uri($query) ) return doc($sparql) }; declare function local:sparql-to-tuples($rdfxml ) { for $result in $rdfxml//r:result return <tuple> { for $binding in $result/r:binding return if ($binding/r:uri) then element {$binding/@name} { attribute type {"uri"} , string($binding/r:uri) } else element {$binding/@name} { attribute type {$binding/@datatype}, string($binding/r:literal) } } </tuple> }; declare function local:clean($text) { let $text:= util:unescape-uri($text,"UTF-8") let $text := replace($text,"http://dbpedia.org/resource/","") let $text := replace($text,"\(.*\)","") let $text := replace($text,"Football__positions#","") let $text := replace($text,"#",",") let $text := replace($text,"_"," ") return $text }; declare option exist:serialize "method=xhtml media-type=application/vnd.google-earth.kml+xml highlight-matches=none"; let $club := request:get-parameter ("club","Arsenal_F.C.") let $queryx := replace($query,"Arsenal_F.C.",$club) let $result:= local:execute-sparql($queryx) let $tuples := local:sparql-to-tuples($result) let $x := response:set-header('Content-disposition',concat('Content-disposition: inline;filename=',$club,'.kml;')) return <Document> <name>Birthplaces of {local:clean($club)} players</name> <Style id="player"> <IconStyle> <Icon><href>http://maps.google.com/mapfiles/kml/pal2/icon49.png</href> </Icon> </IconStyle> </Style> {$result} { for $playername in distinct-values($tuples[lat]/player) let $player := $tuples[player=$playername][lat][1] let $name := local:clean($player/player) let $city :=local:clean($player/city) let $position := local:clean($player/position) let $dob := if ($player/dob castable as xs:date) then datetime:format-date(xs:date($player/dob),"dd MMM, yyyy" ) else "" let $no := if ($player/no castable as xs:integer) then concat(" [# ", xs:integer($player/no),"] ") else "" let $lat := if ($player/lat castable as xs:decimal) then xs:decimal($player/lat) + (math:random() - 0.5)*0.01 else "" let $long := if ($player/long castable as xs:decimal) then xs:decimal($player/long) + (math:random() -0.5)* 0.01 else "" let $description := <div> {concat ($position, $no, " born ", $dob, " in ", $city)} <div><a href="{$player/player}">DBpedia</a> <a href="http://images.google.co.uk/images?q={$name}">Google Images</a> </div> {if ($player/image !="") then <div><img src="{$player/image}" height="200"/> </div> else ()} </div> order by $name return <Placemark> <name>{$name}</name> <description> {util:serialize($description,"method=xhtml")} </description> <Point> <coordinates>{concat($long, ",",$lat,",0")}</coordinates> </Point> <styleUrl>#player</styleUrl> </Placemark> } </Document>
我们还需要一个索引页面,选择英格兰和苏格兰主要联赛的所有俱乐部。这个脚本遵循与上面更复杂的脚本相同的思路,只是由于数据更简单,原始 SPARQL 结果被直接使用,无需转换。
索引按俱乐部名称按字母顺序排序,并提供指向球员地图和基础 DBpedia 数据的链接。
declare option exist:serialize "method=xhtml media-type=text/html"; declare namespace r = "http://www.w3.org/2005/sparql-results#"; declare variable $query := " PREFIX : <http://dbpedia.org/resource/> PREFIX p: <http://dbpedia.org/property/> SELECT * WHERE { ?club p:league ?league. { ?club p:league :Premier_League.} UNION {?club p:league :Football_League_One.} UNION {?club p:league :Football_League_Two.} UNION {?club p:league :Scottish_Premier_League.} UNION {?club p:league :Football_League_Championship.} } "; declare function local:execute-sparql($query as xs:string) { let $sparql := concat("http://dbpedia.org/sparql?format=xml&default-graph-uri=http://dbpedia.org&query=",escape-uri($query,true()) ) return doc($sparql) }; declare function local:clean($string as xs:string) as xs:string { let $string := util:unescape-uri($string,"UTF-8") let $string := replace($string,"\(.*\)","") let $string := replace($string,"_"," ") return $string }; <html> <body> <h1>England and Scottish Football Clubs</h1> <table border="1"> { for $tuple in local:execute-sparql($query)//r:result let $club := $tuple/r:binding[@name="club"]/r:uri let $club :=substring-after($club,"/resource/") let $clubx := local:clean($club) let $league := $tuple/r:binding[@name="league"]/r:uri let $league := local:clean(substring-after($league,"/resource/")) let $mapurl := concat("http://maps.google.co.uk/maps?q=",escape-uri(concat("http://www.cems.uwe.ac.uk/xmlwiki/RDF/club2kml.xq?club=",$club),true())) order by $club return <tr> <td>{$clubx}</td> <td>{$league}</td> <td><a href="{$mapurl}">Player Map</a></td> <td><a href="http://dbpedia.org/resource/{$club}">DBpedia</a></td> </tr> } </table> </body> </html>