java - SAX 解析 : Encountered mixed content within text element

标签 java xml-parsing sax

我正在尝试解析如下所示的 XML 文件(代表电视指南)...

<?xml version="1.0" encoding="utf-8"?>
<channels>
  <channel>
    <name>BBC ONE</name>
    <oid>10029</oid>
      ...
    <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:40:00</end_time>
        <episode>9</episode>
        <genres>Entertainment</genres>
        <oid>10583734</oid>
        <season>8</season>
        <start_time>2013-02-04 00:15:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
      <programme>
        ..
      </programme>
    </programmes>
  </channel>
  <channel>
    ...
  </channel>
</channels>

我正在使用两个解析器 - 一个用于 channel ,另一个用于程序,但显然这意味着我需要检索整个 <programmes>...</programmes>将其传递给“程序”解析器。

我在“ channel ”解析器中尝试了以下...

public List<XMLTVChannel> parse() {
    RootElement rootElement = new RootElement("channels");
    final List<XMLTVChannel> channelsList = new ArrayList<XMLTVChannel>();
    Element channelElement = rootElement.getChild("channel");

    ...

    // Set the EndTextElementListeners for the <channel> child elements
    channelElement.getChild(CHANNEL_OID).setEndTextElementListener(new EndTextElementListener() {
        public void end(String body) {
            currentChannel.setOid(body);
        }
    });

    ...

    // HERE'S THE PROBLEM
    channelElement.getChild("programmes").setEndTextElementListener(new EndTextElementListener() {
        public void end(String body) {
            // NEED TO INVOKE XMLTVProgrammeParser HERE
        }
    });
    try {
        Xml.parse(getInputStream(), Xml.Encoding.UTF_8, rootElement.getContentHandler());
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
    return channelsList;
}

好的,所以我用 Google 搜索了一下,我确切地知道问题是什么 - String body传递给 end(...) 的参数方法应该只包含文本,而它是元素及其文本的混合体。

我读过一些类似的 stackoverflow 问题和文章,它们建议我需要定义自己的 ContentHandler但我还没有发现任何与我正在尝试做的很相似的事情。是定制ContentHandler我唯一的选择还是有其他方法?

最佳答案

你是说你想要这个输出:

 BBC ONE
10029
------------------------
The Celebrity Apprentice USA
2013-02-04 00:15:00 - 2013-02-04 01:40:00
Entertainment
Season : 8 / Episode : 9
Description:
Blah blah blah
10583734
**********************
The Celebrity Apprentice USA
2013-02-04 01:45:00 - 2013-02-04 02:25:00
Entertainment
Season : 8 / Episode : 10
Description:
Blah blah blah
10583735
**********************
//////////////////////////
BBC TWO
10030
------------------------
American Dad
2013-02-04 00:30:00 - 2013-02-04 01:25:00
Cartoon
Season : 14 / Episode : 1
Description:
Blah blah blah
10583734
**********************
American Dad
2013-02-04 01:30:00 - 2013-02-04 02:15:00
Cartoon
Season : 14 / Episode : 2
Description:
Blah blah blah
10583735
**********************
//////////////////////////

我稍微修改了你的 xml 文件:

    <?xml version="1.0" encoding="utf-8"?>
<channels>
  <channel>
    <name>BBC ONE</name>
    <oid>10029</oid>
    <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:40:00</end_time>
        <episode>9</episode>
        <genres>Entertainment</genres>
        <oid>10583734</oid>
        <season>8</season>
        <start_time>2013-02-04 00:15:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
       <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 02:25:00</end_time>
        <episode>10</episode>
        <genres>Entertainment</genres>
        <oid>10583735</oid>
        <season>8</season>
        <start_time>2013-02-04 01:45:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
    </programmes>
  </channel>
  <channel>
      <name>BBC TWO</name>
      <oid>10030</oid>
      <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:25:00</end_time>
        <episode>1</episode>
        <genres>Cartoon</genres>
        <oid>10583734</oid>
        <season>14</season>
        <start_time>2013-02-04 00:30:00</start_time>
        <title>American Dad</title>
      </programme>
       <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 02:15:00</end_time>
        <episode>2</episode>
        <genres>Cartoon</genres>
        <oid>10583735</oid>
        <season>14</season>
        <start_time>2013-02-04 01:30:00</start_time>
        <title>American Dad</title>
      </programme>
    </programmes>
  </channel>
</channels>

Java 类:

channel

public class Channel {

        private String name;
        private String oid;
        private ArrayList<Programme> alProgrammes;

        public Channel(){}

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public String getOid() {
            return oid;
        }

        public void setOid(String oid) {
            this.oid = oid;
        }

        public ArrayList<Programme> getAlProgrammes() {
            return alProgrammes;
        }

        public void setAlProgrammes(ArrayList<Programme> alProgrammes) {
            this.alProgrammes = alProgrammes;
        }


    }

程序

 public class Programme {

    private String description;
    private String end_time;
    private String episode;
    private String genres;
    private String oid;
    private String season;
    private String start_time;
    private String title;



    public Programme() {
    }

    //Getters / Setters
    public String getDescription() {
        return description;
    }
    public void setDescription(String description) {
        this.description = description;
    }
    public String getEnd_time() {
        return end_time;
    }
    public void setEnd_time(String end_time) {
        this.end_time = end_time;
    }
    public String getEpisode() {
        return episode;
    }
    public void setEpisode(String episode) {
        this.episode = episode;
    }
    public String getGenres() {
        return genres;
    }
    public void setGenres(String genres) {
        this.genres = genres;
    }
    public String getOid() {
        return oid;
    }
    public void setOid(String oid) {
        this.oid = oid;
    }
    public String getSeason() {
        return season;
    }
    public void setSeason(String season) {
        this.season = season;
    }
    public String getStart_time() {
        return start_time;
    }
    public void setStart_time(String start_time) {
        this.start_time = start_time;
    }
    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }

}

XMLManager

public final class XMLManager {

    public static ArrayList<Channel> getAlChannels(){

          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder db = null;
          Document doc = null;
          ArrayList<Channel> alChannels = new ArrayList<>();

          try {

            db = dbf.newDocumentBuilder();
            doc = db.parse(new File("D:\\Loic_Workspace\\Test2\\res\\test.xml"));
            NodeList ndListChannels = doc.getElementsByTagName("channel");

            Integer channelsCount = ndListChannels.getLength();
            NodeList ndListChannel = null;
            Integer ndListChannelLength = null;
            Channel channel = null;
            NodeList ndListProgrammes = null;
            for(int i=0;i<channelsCount;i++){

                ndListChannel = ndListChannels.item(i).getChildNodes();
                ndListChannelLength = ndListChannel.getLength();
                channel = new Channel();
                for(int j=0;j<ndListChannelLength;j++){

                    Node currentNode = ndListChannel.item(j);
                    String currentNodeName = currentNode.getNodeName();
                    String value = currentNode.getTextContent();

                    if(currentNodeName.equals("name")){
                        channel.setName(value);
                    }
                    if(currentNodeName.equals("oid")){
                        channel.setOid(value);
                    }
                    if(currentNodeName.equals("programmes")){
                        ndListProgrammes = currentNode.getChildNodes();
                        ArrayList<Programme> alProgrammes = new ArrayList<>();
                        for(int k=0;k<ndListProgrammes.getLength();k++){

                            Node ndProgrammes = ndListProgrammes.item(k);
                            if(ndProgrammes.hasChildNodes()){

                                NodeList ndListProgramme = ndProgrammes.getChildNodes();
                                Integer ndListProgrammeLength = ndListProgramme.getLength();
                                Programme programme = new Programme();
                                for(int l=0;l<ndListProgrammeLength;l++){

                                    Node  ndProgramme = ndListProgramme.item(l);
                                    String nodeProgrameName = ndProgramme.getNodeName();
                                    String nodeProgrameValue = ndProgramme.getTextContent();
                                    if(nodeProgrameName.equals("description")){
                                        programme.setDescription(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("end_time")){

                                        programme.setEnd_time(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("episode")){
                                        programme.setEpisode(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("genres")){
                                        programme.setGenres(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("oid")){
                                        programme.setOid(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("season")){
                                        programme.setSeason(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("start_time")){
                                        programme.setStart_time(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("title")){
                                        programme.setTitle(nodeProgrameValue);
                                    }

                                }

                                alProgrammes.add(programme);

                            }

                        }

                        channel.setAlProgrammes(alProgrammes);

                    }

                }

                alChannels.add(channel);

            }



          } catch (ParserConfigurationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

          return alChannels;

    }



}

主要

public class MyMain {

    /**
     * @param args
     */
    public static void main(String[] args) {


        ArrayList<Channel> alChannels = XMLManager.getAlChannels();
        for(Channel c:alChannels){
            System.out.println(c.getName());
            System.out.println(c.getOid());
            System.out.println("------------------------");
            for(Programme p:c.getAlProgrammes()){
                System.out.println(p.getTitle());
                System.out.println(p.getStart_time()+" - "+p.getEnd_time());
                System.out.println(p.getGenres());
                System.out.println("Season : "+p.getSeason()+" / Episode : "+p.getEpisode());
                System.out.println("Description:\n"+p.getDescription());
                System.out.println(p.getOid());
                System.out.println("**********************");
            }

            System.out.println("//////////////////////////");

        }

    }

}

更新

这是我如何使用 SAX 实现的示例。

重要:我保留了我的类(class)程序和 channel

ChannelsHandler

public class ChannelsHandler extends DefaultHandler{

    private ArrayList<Channel> tvGuide;
    private Channel channel;
    private ArrayList<Programme> alProgrammes;
    private Programme programme;
    private String reading;

    public ChannelsHandler(){
        super();
    }

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {

        if(qName.equals("channels")){
            tvGuide = new ArrayList<>();
        }else if(qName.equals("channel")){
            channel = new Channel();
        }
        else if(qName.equals("channel")){
            channel = new Channel();
        }
        else if(qName.equals("programmes")){
            alProgrammes = new ArrayList<>();
        }
        else if(qName.equals("programme")){
            programme = new Programme();
        }

    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        reading = new String(ch, start, length);
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {

        if(qName.equals("channel")){
            tvGuide.add(channel);
            channel = null;
        }
        if(qName.equals("name")){
            channel.setName(reading);
        }
        else if(qName.equals("programmes")){
            channel.setAlProgrammes(alProgrammes);
            alProgrammes = new ArrayList<>();
        }
        else if(qName.equals("programme")){
            alProgrammes.add(programme);
            programme = null;
        }
        else if(qName.equals("description")){
            programme.setDescription(reading);
        }
        else if(qName.equals("end_time")){
            programme.setEnd_time(reading);
        }
        else if(qName.equals("episode")){
            programme.setEpisode(reading);
        }
        else if(qName.equals("genres")){
            programme.setGenres(reading);
        }
        else if(qName.equals("season")){
            programme.setSeason(reading);
        }
        else if(qName.equals("start_time")){
            programme.setStart_time(reading);
        }
        else if(qName.equals("title")){
            programme.setTitle(reading);
        }

    }

    public ArrayList<Channel> getTVGuide(){
        return tvGuide;
    }



}

我的新 Main

public static void main(String[] args) {

        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser parser = factory.newSAXParser();
            File file = new File("D:\\Loic_Workspace\\TestSAX\\res\\test.xml");
            ChannelsHandler handler = new ChannelsHandler();
            parser.parse(file,handler);
            List<Channel> tvGuide = handler.getTVGuide();
            for(Channel c:tvGuide){
                System.out.println(c.getName());
                System.out.println("------------------------");
                for(Programme p:c.getAlProgrammes()){
                    System.out.println(p.getTitle());
                    System.out.println(p.getStart_time()+" - "+p.getEnd_time());
                    System.out.println(p.getGenres());
                    System.out.println("Season : "+p.getSeason()+" / Episode : "+p.getEpisode());
                    System.out.println("Description:\n"+p.getDescription());
                    System.out.println("**********************");
                }

                System.out.println("//////////////////////////");

            }
        } catch (ParserConfigurationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

在我的控制台中输出:

BBC ONE
------------------------
The Celebrity Apprentice USA
2013-02-04 00:15:00 - 2013-02-04 01:40:00
Entertainment
Season : 8 / Episode : 9
Description:
Blah blah blah
**********************
The Celebrity Apprentice USA
2013-02-04 01:45:00 - 2013-02-04 02:25:00
Entertainment
Season : 8 / Episode : 10
Description:
Blah blah blah
**********************
//////////////////////////
BBC TWO
------------------------
American Dad
2013-02-04 00:30:00 - 2013-02-04 01:25:00
Cartoon
Season : 14 / Episode : 1
Description:
Blah blah blah
**********************
American Dad
2013-02-04 01:30:00 - 2013-02-04 02:15:00
Cartoon
Season : 14 / Episode : 2
Description:
Blah blah blah
**********************
//////////////////////////

这是我第一次使用 SAX。也许您可以找到更有效的方法,但它正在工作:-) 我没有在更新中管理节目或 channel 的重复 OID 标签。

关于java - SAX 解析 : Encountered mixed content within text element,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14679445/

相关文章:

java - 如何使该消息系统按预期工作

java - 如何在 Java 中将 HTML 解析器与 Apache Tika 一起使用以提取所有 HTML 标记?

java - 摘要解析器错误 java.lang.NoSuchMethodException : Employee. <init>()

xml - 与将 <complexType> 从一个 XSD 扩展到另一个 XSD 相关的查询

ruby - Nokogiri::XML.parse 是否应该为换行符创建单独的文本节点?

java - Spring 3.1 命名空间错误组件扫描

java - 如何在SAX解析器中获取文件名?

java - 是否可以将 postgres DB 恢复到本地 liquibase

java - 传递在 Activity 之间单击的 ListView 项目

java - hive XML 解析错误