使用 XML

将 XML 转换为列表

如今 XML 文件被广泛使用，你可能已经注意到 XML 文件高度组织的树状结构与我们在 newLISP 中遇到的嵌套列表结构类似。那么，如果你能像处理列表一样轻松地处理 XML 文件，岂不是很好？

你已经遇到了两个主要的 XML 处理函数。（见 ref 和 ref-all。）xml-parse 和 xml-type-tags 函数是将 XML 转换为 newLISP 列表所需的一切。（xml-error 用于诊断错误。）xml-type-tags 决定了 XML 标签如何被 xml-parse 处理，xml-parse 执行 XML 文件的实际处理，将其转换为列表。

为了说明这些函数的使用，我们将使用 newLISP 论坛的 RSS 新闻提要

(set 'xml (get-url "http://newlispfanclub.alh.net/forum/feed.php"))

并将检索到的 XML 存储在一个文件中，以避免反复访问服务器

(save {/Users/me/Desktop/newlisp.xml} 'xml)  ; save symbol in file
(load {/Users/me/Desktop/newlisp.xml})       ; load symbol from file

XML 以此开头

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-gb">
<link rel="self" type="application/atom+xml" href="http://newlispfanclub.alh.net/forum/feed.php" />

<title>newlispfanclub.alh.net</title>
<subtitle>Friends and Fans of NewLISP</subtitle>
<link href="http://newlispfanclub.alh.net/forum/index.php" />
<updated>2010-01-11T09:51:39+00:00</updated>

<author><name><![CDATA[newlispfanclub.alh.net]]></name></author>
<id>http://newlispfanclub.alh.net/forum/feed.php</id>
<entry>
<author><name><![CDATA[kosh]]></name></author>
<updated>2010-01-10T12:17:53+00:00</updated>
...

如果你使用 xml-parse 解析 XML，但没有先使用 xml-type-tags，输出将如下所示

(xml-parse xml)
(("ELEMENT" "feed" (("xmlns" "http://www.w3.org/2005/Atom") 
  ("xml:lang" "en-gb")) 
  (("TEXT" "\n") ("ELEMENT" "link" (("rel" "self") 
   ("type" "application/atom+xml") 
     ("href" "http://newlispfanclub.alh.net/forum/feed.php")) 
    ()) 
   ("TEXT" "\n\n") 
   ("ELEMENT" "title" () (("TEXT" "newlispfanclub.alh.net"))) 
   ("TEXT" "\n") 
   ("ELEMENT" "subtitle" () (("TEXT" "Friends and Fans of NewLISP"))) 
   ("TEXT" "\n") 
   ("ELEMENT" "link" (("href" "http://newlispfanclub.alh.net/forum/index.php")) ()) 
; ...

虽然它看起来已经有点像 LISP，但你可以看到元素已经被标记为 "ELEMENT"、"TEXT"。现在，我们可以不用这些标签，而这基本上就是 xml-type-tags 的作用。它允许你为四种类型的 XML 标签确定标签：TEXT、CDATA、COMMENTS 和 ELEMENTS。我们将使用四个 nil 将它们全部隐藏起来。我们还将使用一些 xml-parse 的选项来进一步整理输出。

(xml-type-tags nil nil nil nil)
(set 'sxml (xml-parse xml 15))        ; options: 15 (see below)
;->
((feed ((xmlns "http://www.w3.org/2005/Atom") (xml:lang "en-gb")) (link ((rel "self") 
    (type "application/atom+xml") 
    (href "http://newlispfanclub.alh.net/forum/feed.php"))) 
  (title "newlispfanclub.alh.net") 
  (subtitle "Friends and Fans of NewLISP") 
  (link ((href "http://newlispfanclub.alh.net/forum/index.php"))) 
  (updated "2010-01-11T09:51:39+00:00") 
  (author (name "newlispfanclub.alh.net")) 
  (id "http://newlispfanclub.alh.net/forum/feed.php") 
  (entry (author (name "kosh")) 
   (updated "2010-01-10T12:17:53+00:00") 
   (id "http://newlispfanclub.alh.net/forum/viewtopic.php?t=3447&amp;p=17471#p17471") 
   (link ((href "http://newlispfanclub.alh.net/forum/viewtopic.php?t=3447&amp;p=17471#p17471"))) 
   (title ((type "html")) "newLISP in the real world \226\128\162 Re: suggesting the newLISP manual as CHM") 
   (category ((term "newLISP in the real world") 
    (scheme "http://newlispfanclub.alh.net/forum/viewforum.php?f=16") 
     (label "newLISP in the real world"))) 
   (content ((type "html") 
   (xml:base "http://newlispfanclub.alh.net/forum/viewtopic.php?t=3447&amp;p=17471#p17471")) 
    "\nhello kukma.<br />\nI tried to make the newLISP.chm, and this is it.<br />\n
  <br />\n<!-- m --><a class=\"postlink\" 
; ...

这现在是一个有用的 newLISP 列表，虽然它相当复杂，但存储在一个名为 sxml 的符号中。（这种表示 XML 的方式被称为 S-XML。）

如果你想知道 xml-parse 表达式中的 15 在做什么，它只是控制辅助 XML 信息翻译程度的一种方式：选项如下

1 - 抑制空白文本标签

2 - 抑制空属性列表

4 - 抑制注释标签

8 - 将字符串标签转换为符号

16 - 添加 SXML（S 表达式 XML）属性标签

你可以将它们加起来得到选项代码编号 - 因此 15 (+ 1 2 4 8) 使用了前四个选项：抑制不需要的东西，并将字符串标签转换为符号。因此，newLISP 的符号表中添加了新的符号

(author category content entry feed href id label link rel scheme subtitle term 
 title type updated xml:base xml:lang xmlns)

这些对应于 XML 文件中的字符串标签，它们将几乎立即变得有用。

现在怎么办？

目前为止的故事基本上是这样的

(set 'xml (get-url "http://newlispfanclub.alh.net/forum/feed.php"))
; we stored this in a temporary file while exploring
(xml-type-tags nil nil nil nil)
(set 'sxml (xml-parse xml 15))

它为我们提供了一个存储在 sxml 符号中的新闻提要的列表版本。

由于此列表具有复杂的嵌套结构，最好使用 ref 和 ref-all 而不是 find 来查找内容。ref 在列表中查找表达式的第一个出现位置并返回地址

(ref 'entry sxml)
;-> (0 9 0)

这些数字是符号 item 在列表中第一个出现位置的地址：(0 9 0) 表示 从整个列表的第 0 项开始，然后转到该项的第 9 项，然后转到该项的第 0 项。（当然，它是从 0 开始的索引！）

要查找更高层或包含项，请使用 chop 删除地址的最后一级

(chop (ref 'entry sxml))
;-> (0 9)

这现在指向包含第一个项目的级别。就像从地址中删除门牌号，只留下街道名称一样。

现在你可以将此地址与其他接受索引列表的表达式一起使用。最方便和简洁的形式可能是隐式地址，它只是源列表的名称后跟列表中的索引集

(sxml (chop (ref 'entry sxml)))            ; a (0 9) slice of sxml

(entry (author (name "kosh")) (updated "2010-01-10T12:17:53+00:00") (id "http://newlispfanclub.alh.net/forum/viewtopic.php?t=3447&amp;p=17471#p17471") 
 (link ((href "http://newlispfanclub.alh.net/forum/viewtopic.php?t=3447&amp;p=17471#p17471"))) 
 (title ((type "html")) "newLISP in the real world \226\128\162 Re: suggesting the newLISP manual as CHM") 
 (category ((term "newLISP in the real world") (scheme "http://newlispfanclub.alh.net/forum/viewforum.php?f=16") 
   (label "newLISP in the real world"))) 
 (content ((type "html") (xml:base "http://newlispfanclub.alh.net/forum/viewtopic.php?t=3447&amp;p=17471#p17471")) 
...

这找到了 entry 的第一个出现位置，并返回了 SXML 的包含部分。

另一种可用的技术是将列表的部分视为关联列表

(lookup 'title (sxml (chop (ref 'entry sxml))))
;-> 
newLISP in the real world • Re: suggesting the newLISP manual as CHM

在这里，我们像之前一样找到了第一个项目，然后使用 lookup 查找了 title 的第一个出现位置。

使用 ref-all 在列表中查找符号的所有出现位置。它返回一个地址列表

(ref-all 'title sxml)
;-> 
((0 3 0) 
(0 9 5 0) 
(0 10 5 0) 
(0 11 5 0) 
(0 12 5 0) 
(0 13 5 0) 
(0 14 5 0) 
(0 15 5  0) 
 (0 16 5 0) 
 (0 17 5 0) 
 (0 18 5 0))

通过一个简单的列表遍历，你可以快速显示文件中所有标题，无论它们在哪个级别

(dolist (el (ref-all 'title sxml)) 
    (println (rest (rest (sxml (chop el))))))

;->
()
("newLISP in the real world \226\128\162 Re: suggesting the newLISP manual as CHM")
("newLISP newS \226\128\162 Re: newLISP Advocacy")
("newLISP in the real world \226\128\162 Re: newLISP-gs opens only splash picture")
("So, what can you actually DO with newLISP? \226\128\162 Re: Conception of Adaptive Programming Languages")
("Dragonfly \226\128\162 Re: Dragonfly 0.60 Released!")
("So, what can you actually DO with newLISP? \226\128\162 Takuya Mannami, Gauche Newlisp Library")
; ...

如果没有那两个 rest，你会看到

(title "newlispfanclub.alh.net")
(title ((type "html")) "newLISP in the real world \226\128\162 Re: suggesting the newLISP manual as CHM")
(title ((type "html")) "newLISP newS \226\128\162 Re: newLISP Advocacy")
; ...

如你所见，有许多不同的方法可以访问 SXML 数据中的信息。为了简洁地概括 XML 文件中的新闻，一种方法是遍历所有项目，并提取 title 和 description 条目。由于 description 元素是一堆转义实体，我们也将编写一个快速而脏的整理例程

(define (cleanup str)
 (let (replacements 
  '(({&amp;amp;}  {&amp;})
    ({&amp;gt;}   {>})
    ({&amp;lt;}   {<})
    ({&amp;nbsp;} { })
    ({&amp;apos;} {'})
    ({&amp;quot;} {"})
    ({&amp;#40;}  {(})
    ({&amp;#41;}  {)})
    ({&amp;#58;}  {:})
    ("\n"      "")))
 (and
  (!= str "")
  (map 
   (fn (f) (replace (first f) str (last f))) 
   replacements)
  (join (parse str {<.*?>} 4) " "))))

(set 'entries (sxml (chop (chop (ref 'title sxml)))))

(dolist (e (ref-all 'entry entries))
   (set 'entry (entries (chop e)))
   (set 'author (lookup 'author entry))
   (println "Author: " (last author))
   (set 'content (lookup 'content entry))
   (println "Post: " (0 60 (cleanup content)) {...}))

Author: kosh
Post: hello kukma. I tried to make the newLISP.chm, and this is it...
Author: Lutz
Post: ... also, there was a sign-extension error in the newLISP co...
Author: kukma
Post: Thank you Lutz and welcome home again, the principle has bec...
Author: Kazimir Majorinc
Post: Apparently,  Aparecido Valdemir de Freitas  completed his Dr...
Author: cormullion
Post: Upgrade seemed to go well - I think I found most of the file...
Author: Kazimir Majorinc
Post:   http://github.com/mtakuya/gauche-nl-lib   Statistics: Post...
Author: itistoday
Post: As part of my work on Dragonfly, I've updated newLISP's SMTP...
Author: Tim Johnson
Post:   itistoday wrote:     Tim Johnson wrote:  Have you done any...
; ...

更改 SXML

你可以使用类似的技术来修改 XML 格式的数据。例如，假设你在一个 XML 文件中保存元素周期表，并且你想更改元素熔点的相关数据，这些数据目前以开尔文度表示，你想要将其更改为摄氏度。XML 数据看起来像这样

<?xml version="1.0"?>
<PERIODIC_TABLE>
  <ATOM>
  ...
  </ATOM>
  <ATOM>
    <NAME>Mercury</NAME>
    <ATOMIC_WEIGHT>200.59</ATOMIC_WEIGHT>
    <ATOMIC_NUMBER>80</ATOMIC_NUMBER>
    <OXIDATION_STATES>2, 1</OXIDATION_STATES>
    <BOILING_POINT UNITS="Kelvin">629.88</BOILING_POINT>
    <MELTING_POINT UNITS="Kelvin">234.31</MELTING_POINT>
    <SYMBOL>Hg</SYMBOL>
...

当表格被加载到符号 sxml 中后，使用(set 'sxml (xml-parse xml 15))（其中 xml 包含 XML 源代码），我们想要更改每个具有以下形式的子列表

(MELTING_POINT ((UNITS "Kelvin")) "629.88")

你可以使用 set-ref-all 函数在一个表达式中查找和替换元素。首先，以下是一个将温度从开尔文转换为摄氏度的函数

(define (convert-K-to-C n)
 (sub n 273.15))

现在，set-ref-all 函数可以只调用一次来查找所有引用并就地修改它们，以便所有熔点都被转换为摄氏度。形式是

(set-ref-all key list replacement function)

其中函数是使用给定键查找列表元素的方法。

(set-ref-all 
  '(MELTING_POINT ((UNITS "Kelvin")) *) 
  sxml 
  (list 
    (first $0) 
    '((UNITS "Celsius")) 
    (string (convert-K-to-C (float (last $0)))))
  match)

在这里，match 函数使用通配符结构搜索 SXML 列表(MELTING_POINT ( (UNITS "Kelvin") ) *)以查找所有出现位置。替换表达式从存储在 $0 中的匹配表达式构建替换子列表。在该表达式被求值后，SXML 从以下内容

; ...
(ATOM
    (NAME "Mercury")
    (ATOMIC_WEIGHT "200.59")
    (ATOMIC_NUMBER "80")
    (OXIDATION_STATES "2, 1")
    (BOILING_POINT
        ((UNITS "Kelvin")) "629.88")
    (MELTING_POINT
        ((UNITS "Kelvin")) "234.31")
;  ...

更改为以下内容

; ...
(ATOM
    (NAME "Mercury")
    (ATOMIC_WEIGHT "200.59")
    (ATOMIC_NUMBER "80")
    (OXIDATION_STATES "2, 1")
    (BOILING_POINT
        ((UNITS "Kelvin")) "629.88")
    (MELTING_POINT
        ((UNITS "Celsius")) "-38.84")
; ...

XML 并不总是像这样容易操作 - 存在属性、CDATA 部分等等。

将 SXML 输出为 XML

如果你想反过来将 newLISP 列表转换为 XML，以下函数建议了一种可能的方法。它递归地遍历列表

(define (expr2xml expr (level 0))
 (cond 
   ((or (atom? expr) (quote? expr))
      (print (dup " " level))
      (println expr))
   ((list? (first expr))
      (expr2xml (first expr) (+ level 1))
      (dolist (s (rest expr)) (expr2xml s (+ level 1))))
   ((symbol? (first expr))
      (print (dup " " level))
      (println "<" (first expr) ">")
      (dolist (s (rest expr)) (expr2xml s (+ level 1)))
      (print (dup " " level))
      (println "</" (first expr) ">"))
   (true
    (print (dup " " level)) 
    (println "<error>" (string expr) "<error>"))))

(expr2xml sxml)

 <rss>
   <version>
   0.92
   </version>
  <channel>
   <docs>
   http://backend.userland.com/rss092
   </docs>
   <title>
   newLISP Fan Club
   </title>
   <link>
   http://www.alh.net/newlisp/phpbb/
   </link>
   <description>
     Friends and Fans of newLISP                                                         
   </description>
   <managingEditor>
   newlispfanclub-at-excite.com
   </managingEditor>
...

这几乎是我们开始的地方！

一个简单的实际例子

以下示例最初设置在一家小企业的运输部门。我将项目改为水果。XML 数据文件包含所有已售项目的条目及其价格。我们希望生成一份报告，列出每种价格售出了多少件，以及总价值。

以下摘自 XML 数据

<FRUIT>
  <NAME>orange</NAME>
  <charge>0</charge>
  <COLOR>orange</COLOR>
</FRUIT>
<FRUIT>
  <NAME>banana</NAME>
  <COLOR>yellow</COLOR>
  <charge>12.99</charge>
</FRUIT>
<FRUIT>
  <NAME>banana</NAME>
  <COLOR>yellow</COLOR>
  <charge>0</charge>
</FRUIT>
<FRUIT>
  <NAME>banana</NAME>
  <COLOR>yellow</COLOR>
  <charge>No Charge</charge>
</FRUIT>

这是定义和组织任务的主要函数

(define (work-through-files file-list)
 (dolist (fl file-list)
   (set 'table '())
   (scan-file fl)
   (write-report fl)))

调用了两个函数：scan-file，它扫描一个 XML 文件并将所需的信息存储在一个表中，该表将成为某种 newLISP 列表，以及 write-report，它扫描该表并输出一份报告。

scan-file 函数接收一个路径名，将文件转换为 SXML，查找所有 charge 项（使用 ref-all），并统计每个值的次数。我们允许一些免费项目被标记为 No Charge、no charge 或 nocharge

(define (scan-file f)
  (xml-type-tags nil nil nil nil)
  (set 'sxml (xml-parse (read-file f) 15))
  (set 'r-list (ref-all 'charge sxml)) 
  (dolist (r r-list)
    (set 'charge-text (last (sxml (chop r))))
    (if (= (lower-case (replace " " charge-text "")) "nocharge")
        (set 'charge (lower-case charge-text))
        (set 'charge (float charge-text 0 10)))
    (if (set 'result (lookup charge table 1))
        ; if this price already exists in table, increment it
        (setf (assoc charge table) (list charge (inc result)))
        ; or create an entry for it
        (push (list charge 1) table -1))))

write-report 函数对表格进行排序和分析，在进行过程中保持运行总计

(define (write-report fl)
 (set 'total-items 0 'running-total 0 'total-priced-items 0)
 (println "sorting")
 (sort table (fn (x y) (< (float (x 0)) (float (y 0)))))
 (println "sorted ")
 (println "File: " fl)
 (println " Charge           Quantity     Subtotal")
 (dolist (c table)
  (set 'price (float (first c)))
  (set 'quantity (int (last c)))
  (inc total-items quantity)
  (cond 
   ; do the No Charge items:
   ((= price nil)    (println (format " No charge  %12d" quantity)))
   ; do 0.00 items
   ((= price 0)      (println (format "    0.00    %12d" quantity)))
   ; do priced items:
   (true 
    (begin 
     (set 'subtotal (mul price quantity))
     (inc running-total subtotal)
     (if (> price 0) (inc total-priced-items quantity))
     (println (format "%8.2f        %8d  %12.2f" price quantity subtotal))))))
 ; totals
 (println (format "Total charged   %8d  %12.2f" total-priced-items  running-total))            
 (println (format "Grand Total     %8d  %12.2f"  total-items  running-total)))

该报告需要比 scan-file 函数更多地进行调整，特别是用户希望（出于某些原因）将 0 和无费用项目分开。

 Charge           Quantity     Subtotal
 No charge           138
    0.00             145
    0.11               1          0.11
    0.29               1          0.29
    1.89              72        136.08
    1.99              17         33.83
    2.99              18         53.82
   12.99              55        714.45
   17.99               1         17.99
Total charged        165        956.57
Grand Total          448        956.57