Java XML && SAX && StAX && JAXP

原创
2014/07/01 13:24
阅读数 658

XmlPull project

XmlPull project is dedicated(专用) to be a site for

  • for general Pull parsing promotion(提升)/education (including StAX) and in particular to contain easy-to-reuse samples and code fragments

  • a resource for discussing new ideas and concepts related to pull parsing

  • a java namespace (org.xmlpull.*) and project location of a free implementation of the event object API and Factories, based on the StAX XMLStreamReade and old Common API for XML Pull Parsing

  • as java namespace and project location of StAX and XmlPull based utilities and samples such as:

    - providing an XML stream from a DOM tree

    - a DOM builder

    - SAX adapter

    - JUnit tests

  • as a maintenance resource for the existing XmlPull interface

 

About kXML

kXML is a small XML pull parser, specially designed for constrained environments such as Applets, Personal Java or MIDP devices. In contrast to kXML 1, kXML 2 is based on the common XML pull API.

 

About SAX

SAX is the Simple API for XML, originally a Java-only API. SAX was the first widely adopted API for XML in Java, and is a “defacto” standard. The current version is SAX 2.0.1, and there are versions for several programming language environments other than Java.

The SAX2 core includes the org.xml.sax and org.xml.sax.helpers packages, but that's not all there is to SAX. The org.xml.sax.ext package includes standardized extensions, and anyone can define and implement nonstandard ones using the SAX2 core "feature flags" and "property objects" mechanisms(机制机能).

SAX 是事件驱动的 XML 处理方法。它由许多回调组成。例如,startElement() 回调在每次 SAX 解析器遇到元素的起始标记时被调用。characters() 回调为字符数据所调用,然后 endElement() 为元素的结束标记所调用。许多回调用于文档处理、错误和其他词汇结构。您明白了。SAX 程序员实现一个 SAX 接口来定义这些回调。SAX 还提供一个叫做 DefaultHandler 的类(在 org.xml.sax.helpers 软件包中)来实现所有这些回调,并提供所有回调方法默认的空实现。

 

Java使用sax方式解析xml

package xml;

import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

/**
 * Created with IntelliJ IDEA.
 * User: ASUS
 * Date: 14-7-1
 * Time: 下午4:30
 * To change this template use File | Settings | File Templates.
 */
public class SaxHandler extends DefaultHandler {
    @Override
    public void startDocument() throws SAXException {
        System.out.println("========start parse document========");
    }

    @Override
    public void endDocument() throws SAXException {

        System.out.println("========end parse document========");
    }


    /**
     * @param uri        命名空间
     * @param localName  包含名称空间的标签
     * @param qName      不包含名称空间的标签
     * @param attributes 属性的结合
     * @throws SAXException
     */
    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        System.out.println("========start parse element========");
        System.out.println("localName=" + localName);
        System.out.println("qName=" + qName);
        if (attributes != null) {
            for (int i = 0; i < attributes.getLength(); i++) {
                System.out.println(attributes.getQName(i) + "=\"" + attributes.getValue(i) + "\"");
            }
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        super.endElement(uri, localName, qName);    //To change body of overridden methods use File | Settings | File Templates.
    }

    @Override
    public void setDocumentLocator(Locator locator) {
        super.setDocumentLocator(locator);    //To change body of overridden methods use File | Settings | File Templates.
    }

    @Override
    public void startPrefixMapping(String prefix, String uri) throws SAXException {
        super.startPrefixMapping(prefix, uri);    //To change body of overridden methods use File | Settings | File Templates.
    }

    @Override
    public void endPrefixMapping(String prefix) throws SAXException {
        super.endPrefixMapping(prefix);    //To change body of overridden methods use File | Settings | File Templates.
    }


    /**
     * @param ch     回传的字符数组
     * @param start  字符数组的开始位置
     * @param length 字符数组的结束位置
     * @throws SAXException
     */
    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        System.out.println("start=" + start + ",length=" + length + ",length of ch=" + ch.length);
        String content = new String(ch, start, length);
        System.out.println("content = " + content.toString()); //只做简单的打印处理
    }
}

 

package xml;

import org.junit.Test;
import org.xml.sax.SAXException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.ByteArrayInputStream;
import java.io.IOException;

public class HelloTest {
    @Test
    public void test00() throws ParserConfigurationException, SAXException, IOException {

        String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> \n" +
                "<books> \n" +
                "   <book id=\"001\" isbn=\"1998\"> \n" +
                "      <title>Harry Potter</title> \n" +
                "      <author>J K. Rowling</author> \n" +
                "   </book> \n" +
                "   <book id=\"002\" isbn=\"1998\"> \n" +
                "      <title>Learning XML</title> \n" +
                "      <author>Erik T. Ray</author> \n" +
                "   </book> \n" +
                "</books> ";

        SAXParserFactory factory = SAXParserFactory.newInstance();
        //创建解析器
        SAXParser parser = factory.newSAXParser();

        ByteArrayInputStream in = new ByteArrayInputStream(xml.getBytes());
        parser.parse(in, new SaxHandler());
    }
}

 

The Streaming API for XML (StAX)

StAX is a standard XML processing API that allows you to stream XML data from and to your application. This StAX implementation is the standard pull parser implementation for JSR-173 specification.

Features of StAX API:

  • The standard pull parser interface (included in JDK 1.6, downloadable separately for 1.4, 1.5)

  • Reader and Writer APIs: both with two levels, "raw" cursor access and object-based "event" access

  • Efficient XML access (especially using cursor API)

  • Gives application control over parsing: "reverse Hollywood": you can call us, instead of waiting us to call you

 

Java使用StAX解析xml

package xml;

import org.junit.Test;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import java.io.ByteArrayInputStream;

public class HelloTest {

    @Test
    public void test898() throws XMLStreamException {

        String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?> \n" +
                "<books> \n" +
                "   <book id=\"001\" isbn=\"1998\"> \n" +
                "      <title>Harry Potter</title> \n" +
                "      <author>J K. Rowling</author> \n" +
                "   </book> \n" +
                "   <book id=\"002\" isbn=\"1998\"> \n" +
                "      <title>Learning XML</title> \n" +
                "      <author>Erik T. Ray</author> \n" +
                "   </book> \n" +
                "</books> ";


        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        ByteArrayInputStream in = new ByteArrayInputStream(xml.getBytes());
        XMLStreamReader reader = factory.createXMLStreamReader(in);

        int event = reader.getEventType();
        System.out.println("event=" + event);

        while (true) {
            switch (event) {
                case XMLStreamConstants.START_DOCUMENT:
                    System.out.println("Start Document.");
                    break;
                case XMLStreamConstants.START_ELEMENT:
                    System.out.println("Start Element: " + reader.getName());
                    for (int i = 0, n = reader.getAttributeCount(); i < n; ++i)
                        System.out.println("Attribute: " + reader.getAttributeName(i)
                                + "=" + reader.getAttributeValue(i));

                    break;
                case XMLStreamConstants.CHARACTERS:
                    if (reader.isWhiteSpace())
                        break;

                    System.out.println("Text: " + reader.getText());
                    break;

                case XMLStreamConstants.END_ELEMENT:
                    System.out.println("End Element:" + reader.getName());
                    break;

                case XMLStreamConstants.END_DOCUMENT:
                    System.out.println("End Document.");
                    break;
            }

            if (!reader.hasNext()) {
                break;
            }
            event = reader.next();
        }
    }

}

 

XML Parsing Models

  • Object (DOM, JDOM etc.)---Tree Model, DOM

  • Push (SAX)---Push Model, SAX

  • Pull (StAX)---Pull Model, StAX

Object (DOM, JDOM etc.)---Tree Model, DOM

Push (SAX)---Push Model, SAX

Pull Model, StAX

StAX Pull Model Twoapproach..

1. Cursor APIs----XMLStreamReader

2. Event APIs-----XMLEventReader

StAX Events

1) Namespace

2) StartDocument

3) EndDocument

4) StartEl ement

5) EndEl ement

6) Attri bute

7) Enti tyDecl arati on

8) Enti tyReference

9) Notati on

10)PI

11)DTD

12)Characters

13)Comment

 

The Java API for XML Processing (JAXP) 

The Java API for XML Processing (JAXP) enables applications to parse, transform, validate and query XML documents using an API that is independent(独立地) of a particular XML processor implementation. JAXP provides a pluggability layer to enable vendors(供应商) to provide their own implementations without introducing dependencies in application code.  Using this software, application and tool developers can build fully-functional XML-enabled Java applications for e-commerce, application integration(积分), and web publishing.

JAXP is a standard component(标准组件) in the Java platform. An implementation of the JAXP 1.4 is included in Java SE 6.0 and OpenJDK7, JAXP 1.5 in OpenJDK7 update 40 as well as in Java SE 8.0, and JAXP 1.6 is now in Java SE 8.0. JAXP 1.4 is a maintenance release of JAXP 1.3 with support for the Streaming API for XML (StAX), JAXP 1.5 is a maintenance release of JAXP 1.4 with new security related properties, and JAXP 1.6 is part of prepare for modularization.

JAXP全称Java API for XML Processing,最开始的时候(JAXP1.0)是叫Java API for XML Parsing,因为那个时候JAXP还仅支持XML的解析,后来JAXP不断进化,其支持的内容不断增加,也就改名为Processing了。

JAXP利用标准解析器Simple API for XML Parsing (SAX) 和 Document Object Model (DOM) 使我们可以在将数据作为事件流来解析或者构建出文档对象模型来解析中作出选择;JAXP 支持 Extensible Stylesheet Language Transformations (XSLT) 标准, 使我们能够将数据转换成其他的XML文档或其他格式,如HTML;从JAXP1.4版本开始,Streaming API for XML (StAX,JSR-173) 被加入到JAXP家庭中来。

 

相关API总览

  • javax.xml.parsers: JAXP相关API,为不同SAX and DOM 解析提供商提供通用接口

  • org.w3c.dom: 文档对象模型(DOM)相关的接口

  • org.xml.sax: 定义基本的SAX相关API

  • javax.xml.transform:定义了XSLT相关 API,使我们可以将XML转换成其它形式

  • javax.xml.stream:提供StAX相关的 API

 

JAXP各相关API特征对比

特征

StAX

SAX

DOM

TrAX

API Type

Pull, streaming

Push, streaming

In memory tree

XSLT Rule

Ease of Use

High

Medium

High

Medium

XPath Capability

No

No

Yes

Yes

CPU and Memory Efficiency

Good

Good

Varies

Varies

Forward Only

Yes

Yes

No

No

Read XML

Yes

Yes

Yes

Yes

Write XML

Yes

No

Yes

Yes

Create, Read, Update, Delete

No

No

Yes

No

可以更新JAXP的语法分析器

=============END=============

展开阅读全文
打赏
0
9 收藏
分享
加载中
更多评论
打赏
0 评论
9 收藏
0
分享
返回顶部
顶部