XQuery/全球气温记录
最近,英国气象局发布了全球约 1600 个站点的温度记录。每个站点记录都可以在线以文本文件形式获取,例如 斯托诺韦.
本案例研究描述了一个将这些数据作为 XML 提供的项目。主页是 http://www.cems.uwe.ac.uk/xmlwiki/Climate/index.html
第一步是将纯文本转换为 XML。主页解释了此文本文件的格式。代码 030260 是世界气象组织定义的站点代码。这些文件似乎存储在国家代码目录中。(实际上,这些在世界气象组织术语中被称为区块。)
使用 HTTP 获取远程数据文件的任务是常见任务,XQuery 模块中已经存在用于执行此任务的函数。
此模块声明了一个用于解析的常量
declare variable $csv:newline:= " ";
以及用于获取文本的基本函数,该函数可以是纯文本或 Base64 编码
declare function csv:get-data ($uri as xs:string , $binary as xs:boolean) as xs:string? {
(:~
: Get a file via HTTP and convert the body of the HTTP response to text
: force the script to get the latest version using the HTTP Pragma header
: @param uri - URI of the text file to read
: @param binary - true if data is base64 encoded
: @return - the body of the response as text or null
:)
let $headers := element headers { element header {attribute name {"Pragma" }, attribute value {"no-cache"}}}
let $response := httpclient:get(xs:anyURI($uri), true(), $headers)
return
if ($response/@statusCode eq "200")
then
let $raw := $response/httpclient:body
return
if ($binary)
then util:binary-to-string($raw)
else xmldb:decode($raw)
else ()
};
我们将创建一个包含用于执行解析的函数的 XQuery 模块
module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met";
需要导入 csv 模块
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";
现在是用于解析 MET 气候数据的函数
declare function met:station-to-xml ($station as xs:string) as element(TemperatureRecord)? {
(:~
: GET and parse a MET office temperature record as documented in
: http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html
: @param the station number
: @return the temperature record as an adhoc XML structure matched closely to the terms used in the original record
:)
let $country := substring($station,1,2) (: this is the directory for all temperature records in a country :)
(: construct the URI for the corresponding record :)
let $uri := concat("http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/",$country,"/",$station)
(:GET and convert to plain text :)
let $data := csv:get-data($uri,false())
return
if (empty($data)) then ()
else
(: split into two sections :)
let $headertext := substring-before($data,"Obs:")
(: the first section contains the meta data in the form of name=value statements :)
let $headers := tokenize($headertext,$csv:nl)
(: the second section is the temperature record, year by year :)
let $temperatures := substring-after ($data,"Obs:")
let $years := tokenize($temperatures, $csv:nl)
return
element TemperatureRecord {
element sourceURI {$uri}, (: the original temperature record :)
for $header in $headers (: split each line into a name and its value :)
let $name := replace(substring-before($header,"=")," ","") (: to create a valid XML name, just remove any spaces :)
let $value := normalize-space(substring-after ($header,"="))
where $name ne ""
return
element {$name} { (:create an XML element with the name :)
if ($name = ("Normals","Standarddeviations")) (: these names have values which are a list of temperatures :)
then
for $temp in tokenize($value,"\s+") (: temperatures are space-separated :)
return
element temp_C {$temp}
else if ($name = ("Name","Country")) (: these names contain redundant hyphens :)
then replace ($value,"-","")
else if ($name = "Long") (: the convention for signing longitudes in this data is the reverse of the usual E +, W - convention :)
then - xs:decimal($value)
else
$value
},
for $year in $years
let $value := tokenize($year,"\s+")
where $year ne ""
return
element monthlyAverages {
attribute year {$value[1]}, (: the first value in the row is the year :)
for $i in (2 to 13) (: the remainder are the temperatures for the months Jan to Dec :)
let $temp := $value[$i]
return
element temp_C {
if ($temp ne '-99.0') (: generate all months, but those with no reading indicated by -99 will be empty :)
then $temp
else ()
}
}
}
};
主脚本使用这些函数来转换给定站点的记录
(:~
: convert climate file to XML
: @param station id of station
:)
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $station := request:get-parameter("station",())
return local:station-to-xml($station,false())
站点 ID 基于世界气象组织定义的 ID。可以在线以 文本文件形式获取所有站点的完整列表,并附有 支持文档.
典型的记录是
00;000;PABL;Buckland, Buckland Airport;AK;United States;4;65-58-56N;161-09-07W;;;7;;
这些记录的格式是
- 区块号 表示世界气象组织分配的区块的 2 位数字。
- 站点号 表示世界气象组织分配的站点的 3 位数字。
- ICAO 地点指示符 4 个字母数字字符,并非所有此文件中的站点都已分配地点指示符。对于未分配地点指示符的站点,使用“----”。
- 地点名称 站点位置的通用名称。
- 州 2 个字符的缩写(仅适用于位于美国的站点)。
- 国家名称 国家名称为 ISO 简短英文形式。
- 世界气象组织区域 1 到 6 位数字,表示相应的世界气象组织区域,7 代表世界气象组织南极洲区域。
- 站点纬度 DD-MM-SSH,其中 DD 为度,MM 为分,SS 为秒,H 为 N 代表北半球或 S 代表南半球。对于秒值未知的站点,秒值被省略。
- 站点经度 DDD-MM-SSH,其中 DDD 为度,MM 为分,SS 为秒,H 为 E 代表东半球或 W 代表西半球。对于秒值未知的站点,秒值被省略。
- 高空纬度 DD-MM-SSH,其中 DD 为度,MM 为分,SS 为秒,H 为 N 代表北半球或 S 代表南半球。对于秒值未知的站点,秒值被省略。
- 高空经度 DDD-MM-SSH,其中 DDD 为度,MM 为分,SS 为秒,H 为 E 代表东半球或 W 代表西半球。对于秒值未知的站点,秒值被省略。
- 站点海拔(Ha) 站点海拔(米)。如果未知,则省略该值。
- 高空海拔(Hp) 高空海拔(米)。如果未知,则省略该值。
- RBSN 指示符 如果站点由世界气象组织定义为属于区域基本天气观测网,则为 P,否则省略。
需要一个函数来将纬度和经度从 DD-MM-SSH 格式转换。这会因该格式的变体而变得复杂。这些变体全部出现在数据中
- DD-MMH
- DD-MH
- DD-MM-SH
- DD-MM-SSH
由于这种格式出现在其他数据中,因此已将其添加到通用地理函数模块中。
declare function geo:lz ($n as xs:string?) as xs:integer {
xs:integer(concat (string-pad("0",2 - string-length($n)),$n))
};
declare function geo:dms-to-decimal($s as xs:string) as xs:decimal {
(:~
: @param $s - input string in the format of DD-MMH, DD-MH, DD-MM-SH,* DD-MM-SSH
: where H is NSE or W
: @return decimal degrees
:)
let $hemi := substring($s,string-length($s),1)
let $rest := substring($s,1, string-length($s)-1)
let $f := tokenize($rest,"-")
let $deg := geo:lz($f[1])
let $min:= geo:lz($f[2])
let $sec := geo:lz($f[3])
let $dec :=$deg + ($min + $sec div 60) div 60
let $dec := round-half-to-even($dec,6)
return
if ($hemi = ("S","W"))
then - $dec
else $dec
};
必须导入 geo 模块
import module namespace geo = "http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm";
解析站点数据。
declare function met:WMO-to-xml ($station as xs:string ) as element (station) {
(:~
: @param $station string describing a station
: Upper Air data is ignored at present.
:)
let $f := tokenize(normalize-space($station),";")
let $cid := concat($f[1],$f[2],"0") (: this constructs the equivalent id used in the temperature records :)
return
element station{
element block {$f[1]},
element number {$f[2]},
element id {$cid},
if ($f[3] eq "----") then () else element ICAO {$f[3]},
element placeName {$f[4]},
if ($f[5] ne "") then element state {$f[5]} else (),
element country {$f[6]},
element WMORegion {$f[7]},
element latitude {geo:dms-to-dec($f[8])},
element longitude {geo:dms-to-dec($f[9])},
if ($f[12] ne "") then element elevation {$f[12]} else (),
if ($f[14] = "P") then element RBSN {} else ()
}
};
XQuery 脚本获取文本文件并将每行转换为 XML 站点元素。然后将这些元素逐个插入到一个空的 XML 文件中。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../csv.xqm";
<results>
{
(: create the empty XML document :)
let $login := xmldb:login("/db/Wiki/Climate","user","password")
let $store := xmldb:store("/db/Wiki/Climate/Stations","metstations.xml",<stations/>)
let $doc := doc($store)/stations
(: get the text list of stations and convert :)
let $station-list := "http://weather.noaa.gov/data/nsd_bbsss.txt"
let $csv := csv:get-data($station-list,false())
for $data in tokenize($csv,$nl)
where $station ne ""
return
let $station := met:WMO-station-to-xml($data)
let $update := update insert $station into $doc
return
<station>{$xml/id}</station>
}
</results>
总共有 11000 多个站点。需要对其进行索引以实现高效访问。在 eXist 中,索引在每个集合(目录)的配置文件中定义。对于将写入站点 XML 文档的集合,配置文件为
<collection xmlns="http://exist-db.org/collection-config/1.0">
<index>
<create qname="id" type="xs:string"/>
<create qname="country" type="xs:string"/>
</index>
</collection>
这意味着集合中的所有 XML 文档都将在 qname id 和 country 中被索引,无论这些 qname 出现在 XML 结构中的何处。索引将在将文档添加到集合或更新现有文档时执行。如果需要,可以强制重新索引。
如果站点数据存储在集合 /db/Wiki/Climate/Stations 中,那么此配置文件将存储在 /db/system/config/db/Wiki/Climate/Stations 中,名为 configuration.xconf
由于代码将在多个地方引用此集合,因此我们向库模块添加一个常量来引用车站集
declare variable $met:WMOStations := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station;
需要完整列出车站才能提供索引。此数据不以简单文件的形式提供,而是以 JavaScript 数组的形式编码在 HTML 页面上。
locations[1]=["409380|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Herat"",409480|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kabul Airport","409900|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kandahar Airport"]; ...
但是这里没有位置数据,因此我们将从 WMO 车站列表中获取这些数据
将此数据转换为 XML 的方法是
- 查看 HTML 页面的源代码
- 找到车站列表
- 复制文本
- 将文本文件保存在 eXist 数据库中
- 脚本读取此文件并将其解析为 XML
- 结果 XML 通过 WMO 车站数据增加了经度和纬度。
- 最终 XML 文档存储在数据库中的同一 Station 目录中
(:~
: convert the text representation of MET stations from the WMO list to XML
:)
<stationList>
{
(: get the raw data from a text file stored as base64 in the eXist dataabse :)
let $text := util:binary-to-string(util:binary-doc("/db/Wiki/Climate/cstations.txt"))
(: ; separates the stations in each country :)
for $country in tokenize($text,";")
(: the station list is the array element content i.e. the string between =[ and ] :)
let $stationlist := substring-before(substring-after($country,"=["),"]")
(: The stations in each country are comma-separated, but commas are also used within the names of countries and stations. However a comma followed by a double quote is the required separator. :)
let $stations := tokenize($stationlist,',"')
for $station in $stations
(: some cleanup of names is needed :)
let $data :=replace ( replace($station,'"',""),"
","")
(: Each station is in the format of
Stationid | English name / French name
:)
let $f := tokenize($data,"\|")
let $id := $f[1]
let $country := tokenize($f[2],"/")
let $WMOStation := $met:WMOStations[id=$id]
(: create a station element containing the id , country and english station name :)
return
element station {
element id {$f[1]},
element country {normalize-space($country[1])},
element location {$f[3]},
$WMOStation/latitude,
$WMOStation/longitude
}
}
</stationList>
将此文件存储在同一 Stations 集合中意味着它将根据与完整 WMO 车站数据相同的元素名称、id 和 country 进行索引。
此车站集也将引用多个地方,因此我们定义了一个变量
declare variable $met:tempStations := doc ("/db/Wiki/Climate/Stations/tempstations.xml")//station;
我们将使用 XSLT 将此 XML 转换为车站位置和温度图表的形式。最初的样式表由 Dave Challender 开发。
(将添加解释)
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" exclude-result-prefixes="msxsl">
<!-- Authored by Dave Callender, minor mods by Chris Wallace -->
<xsl:output method="html"/>
<xsl:param name="start-year" select="1000"/>
<xsl:param name="end-year" select="3000"/>
<xsl:template match="Station">
<html>
<head>
<script type="text/javascript" src="http://www.google.com/jsapi"/>
<title>
<xsl:value-of select="station/placeName"/>
<xsl:text> </xsl:text>
<xsl:value-of select="station/country"/>
</title>
</head>
<body>
<xsl:apply-templates select="station"/>
<xsl:apply-templates select="TemperatureRecord" mode="googlevis"/>
<xsl:apply-templates select="TemperatureRecord" mode="table"/>
<xsl:apply-templates select="TemperatureRecord" mode="smoothed"/>
</body>
</html>
</xsl:template>
<!-- Visualization of the full temperature record -->
<xsl:template match="TemperatureRecord" mode="googlevis">
<p/>
<p>Google visualization timeline (takes no account of standard deviation etc.)</p>
<div id="chart_div" style="width: 700px; height: 440px;"/>
<p/>
<script type="text/javascript">
google.load('visualization', '1', {'packages':['annotatedtimeline']});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = new google.visualization.DataTable();
data.addColumn('date', 'Date');
data.addColumn('number', 'temp');
data.addRows([
<xsl:apply-templates select="monthlyAverages[@year][@year >= $start-year][@year <= $end-year]" mode="googlevis"/>
[null,null]
]);
var chart = new google.visualization.AnnotatedTimeLine(document.getElementById('chart_div'));
chart.draw(data, {displayAnnotations: true});
}
</script>
</xsl:template>
<xsl:template match="temp_C" mode="googlevis">
<xsl:if test="(node())">
<xsl:text>[new Date(</xsl:text>
<xsl:value-of select="../@year"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="position() - 1 "/>
<!-- Google viz uses 0-based arrays -->
<xsl:text>,15),</xsl:text>
<xsl:value-of select="."/>
<xsl:text>],
</xsl:text>
</xsl:if>
</xsl:template>
<!-- Vizualisation of the smoothed data -->
<xsl:template match="TemperatureRecord" mode="smoothed">
<p/>
<p>Almost totally meaningless - sum all temps for a year and divide by 12 (only do if all 12
data points) but shows a bit of playing with data</p>
<p/>
<div id="smoothed_chart_div" style="width: 700px; height: 440px;"/>
<script type="text/javascript">
google.load('visualization', '1', {'packages':['annotatedtimeline']});
google.setOnLoadCallback(drawChartSmoothed);
function drawChartSmoothed()
{
var data = new google.visualization.DataTable();
data.addColumn('date', 'Date');
data.addColumn('number', 'temp');
data.addRows([
<xsl:apply-templates select="monthlyAverages[@year][@year >= $start-year][@year <=$end-year]" mode="smoothed"/>
[null,null]
]);
var chart = new google.visualization.AnnotatedTimeLine(document.getElementById('smoothed_chart_div'));
chart.draw(data, {displayAnnotations: true});
}
</script>
</xsl:template>
<xsl:template match="monthlyAverages" mode="smoothed">
<xsl:if test="count(temp_C[node()])=12">
<xsl:text>[new Date(</xsl:text>
<xsl:value-of select="@year"/>
<xsl:text>,5,15),</xsl:text>
<xsl:value-of select="sum(temp_C[node()]) div 12"/>
<xsl:text>],
</xsl:text>
</xsl:if>
</xsl:template>
<!-- Data tabulated -->
<xsl:template match="TemperatureRecord" mode="table">
<table border="1">
<tr>
<td>Year</td>
<td>Jan</td>
<td>Feb</td>
<td>Mar</td>
<td>Apr</td>
<td>May</td>
<td>Jun</td>
<td>Jul</td>
<td>Aug</td>
<td>Sep</td>
<td>Oct</td>
<td>Nov</td>
<td>Dec</td>
<tr/>
</tr>
<xsl:apply-templates
select="monthlyAverages[@year][@year >= $start-year][@year < $end-year]"
mode="table"/>
</table>
</xsl:template>
<xsl:template match="monthlyAverages" mode="table">
<tr>
<td>
<xsl:value-of select="@year"/>
</td>
<xsl:apply-templates select="temp_C" mode="table"/>
</tr>
</xsl:template>
<xsl:template match="temp_C" mode="table">
<td>
<xsl:value-of select="."/>
</td>
</xsl:template>
<xsl:template match="Number">
<p> Station Number:  <xsl:value-of select="."/>
</p>
</xsl:template>
<xsl:template match="station">
<h1>
<xsl:value-of select="placeName"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="country"/>
<xsl:text> </xsl:text>
</h1>
<a href="http://maps.google.com/maps?q={latitude},{longitude}">
<img
src="http://maps.google.com/maps/api/staticmap?zoom=11&maptype=hybrid&size=400x300&sensor=false&key=ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ&markers=color:blue|{latitude},{longitude}"
alt="{placeName}"/>
</a>
</xsl:template>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
我们希望呈现原始 XML 或 HTML 可视化页面。我们可以使用两个脚本,或者将它们组合成一个脚本,并使用参数来指示如何呈现输出。eXist 函数允许动态设置输出的序列化和 mime 类型。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
let $id := request:get-parameter("station",())
let $render := request:get-parameter("render",())
let $station := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station[id = $id]
let $tempStation := doc("/db/Wiki/Climate/Stations/tempstations.xml")//station[id = $id]
let $temp :=
if ($tempStaion)
then met:station-to-xml($id)
else ()
let $station :=
<Station>
{$station}
{$temp}
</Station>
return
if ($render="HTML")
then
let $ss := doc("/db/Wiki/Climate/FullHTMLMet-V2.xsl")
let $options := util:declare-option("exist:serialize","method=xhtml media-type=text/html")
let $start-year := request:get-parameter("start","1000")
let $end-year := request:get-parameter("end","2100")
let $params :=
<parameters>
<param name="start-year" value="{$start-year}"/>
<param name="end-year" value="{$end-year}"/>
</parameters>
return
transform:transform($station,$ss,$params)
else
let $header := response:set-header("Access-Control-Allow-Origin","*")
return $station
我们可以使用存储的车站列表来创建一个简单的 HTML 索引。
import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
declare option exist:serialize "method=xhtml media-type=text/html";
<html>
<head>
<title>Index of Temperature Record Stations </title>
</head>
<body>
<h1>Index of Temperature Record Stations </h1>
{
for $country in distinct-values($met:tempStations/country)
order by $country
return
<div>
<h3>{$country} </h3>
{for $station in $met:tempStations[country=$country]
let $id := $station/id
order by $station/location
return
<span><a href="station.xq?station={$id}&render=HTML">{string($station/location)}</a>
</span>
}
</div>
}
</body>
</html>
我们还可以生成一个(大型)KML 叠加层,其中包含指向每个车站页面的链接。
我们需要一个函数将车站转换为具有指向 HTML 车站页面的链接的 PlaceMark
declare function met:station-to-placemark ($station) {
let $description :=
<div>
<a href="http://www.cems.uwe.ac.uk/xmlwiki/Climate/station.xq?station={$station/id}&render=HTML">Temperature Record</a>
</div>
return
<Placemark>
<name>{string($station/location)}, {string($station/country)}</name>
<description>{util:serialize($description,"method=xhtml")} </description>
<Point>
<coordinates>{string($station/longitude)},{string($station/latitude)},0</coordinates>
</Point>
</Placemark>
};
然后主脚本遍历所有气温站以生成完整的 KML 文件。
import module namespace met ="http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";
declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes omit-xml-declaration=yes";
let $x := response:set-header('Content-Disposition','attachment;filename=country.kml')
return
<kml xmlns="http://www.opengis.net/kml/2.2">
<Folder>
<name>Stations</name>
{ for $station in $met:tempStations
return met:station-to-placemark($station)
}
</Folder>
</kml>
- 资源 URI
- RDF