Possible Duplicate:
Using an NSXMLParser to parse HTML
我正在尝试解析以下XML数据,但结构混乱,并且没有结束标记。这不是我制作的XML文件,而是我正在尝试从Web服务器解析的文件。<FORM ACTION="/prod/bwckgens.p_proc_term_date" METHOD="POST" onSubmit="return checkSubmit()"> <INPUT TYPE="hidden" NAME="p_calling_proc" VALUE="bwckschd.p_disp_dyn_sched"> <TABLE CLASS="dataentrytable" summary="This layout table is used for term selection."width="100%"><CAPTION class="captiontext">Search by Term: </CAPTION> <TR> <TD CLASS="dedefault"><LABEL for=term_input_id><SPAN class="fieldlabeltextinvisible">Term</SPAN></LABEL> <SELECT NAME="p_term" SIZE="1" ID="term_input_id"> <OPTION VALUE="">None <OPTION VALUE="201320">Spring 2013 <OPTION VALUE="201315">STAR/BGR: New Admits Fall 2012 (View only) <OPTION VALUE="201310">Fall 2012 (View only) <OPTION VALUE="201230">Summer 2012 (View only) <OPTION VALUE="201220">Spring 2012 (View only) <OPTION VALUE="201210">Fall 2011 (View only) <OPTION VALUE="201130">Summer 2011 (View only) <OPTION VALUE="201120">Spring 2011 (View only) <OPTION VALUE="201110">Fall 2010 (View only) <OPTION VALUE="201030">Summer 2010 (View only) <OPTION VALUE="201020">Spring 2010 (View only) <OPTION VALUE="201010">Fall 2009 (View only) <OPTION VALUE="200930">Summer 2009 (View only) <OPTION VALUE="200920">Spring 2009 (View only) <OPTION VALUE="200910">Fall 2008 (View only) <OPTION VALUE="200830">Summer 2008 (View only) <OPTION VALUE="200820">Spring 2008 (View only) </SELECT> </TD> </TR> </TABLE> <BR> <BR> <INPUT TYPE="submit" VALUE="Submit"> <INPUT TYPE="reset" VALUE="Reset"> </FORM>
HTML文件还有很多其他内容,但我仅包括相关内容。我想获得括号内所有数字OPTION VALUE="these numbers"
和Term。例如2013年 Spring 。
由于没有结束标记,如何使用NSXMLParser
获取这些值。我尝试打印出解析器遇到的所有元素NSLog(@"Current start element: %@\n", elementName); NSLog(@"Current attr:%@\n", attributeDict.description);
但我在任何地方都看不到OPTION
或VALUE
。这是NSLog
语句的结果:2012-10-28 13:58:47.638 Purdue Course Finder[32890:c07] Current start element: HTML 2012-10-28 13:58:47.638 Purdue Course Finder[32890:c07] Current attr:{ lang = en; } 2012-10-28 13:58:47.639 Purdue Course Finder[32890:c07] Current start element: HEAD 2012-10-28 13:58:47.639 Purdue Course Finder[32890:c07] Current attr:{ } 2012-10-28 13:58:47.639 Purdue Course Finder[32890:c07] Current start element: META 2012-10-28 13:58:47.640 Purdue Course Finder[32890:c07] Current attr:{ content = "text/html; charset=UTF-8"; "http-equiv" = "Content-Type"; } 2012-10-28 13:58:47.640 Purdue Course Finder[32890:c07] Current start element: META 2012-10-28 13:58:47.640 Purdue Course Finder[32890:c07] Current attr:{ CONTENT = "no-cache"; "HTTP-EQUIV" = Pragma; NAME = "Cache-Control"; } 2012-10-28 13:58:47.641 Purdue Course Finder[32890:c07] Current start element: META 2012-10-28 13:58:47.641 Purdue Course Finder[32890:c07] Current attr:{ CONTENT = "no-cache"; "HTTP-EQUIV" = "Cache-Control"; NAME = "Cache-Control"; } 2012-10-28 13:58:47.641 Purdue Course Finder[32890:c07] Current start element: LINK 2012-10-28 13:58:47.642 Purdue Course Finder[32890:c07] Current attr:{ HREF = "/css/web_defaultapp.css"; REL = stylesheet; TYPE = "text/css"; } 2012-10-28 13:58:47.642 Purdue Course Finder[32890:c07] Current start element: LINK 2012-10-28 13:58:47.642 Purdue Course Finder[32890:c07] Current attr:{ HREF = "/css/web_defaultprint.css"; REL = stylesheet; TYPE = "text/css"; media = print; } 2012-10-28 13:58:47.643 Purdue Course Finder[32890:c07] Current start element: TITLE 2012-10-28 13:58:47.643 Purdue Course Finder[32890:c07] Current attr:{ } 2012-10-28 13:58:47.643 Purdue Course Finder[32890:c07] Current end element: TITLE 2012-10-28 13:58:47.644 Purdue Course Finder[32890:c07] Current start element: META 2012-10-28 13:58:47.644 Purdue Course Finder[32890:c07] Current attr:{ CONTENT = "text/javascript"; "HTTP-EQUIV" = "Content-Script-Type"; NAME = "Default_Script_Language"; } 2012-10-28 13:58:47.644 Purdue Course Finder[32890:c07] Current start element: SCRIPT 2012-10-28 13:58:47.645 Purdue Course Finder[32890:c07] Current attr:{ LANGUAGE = JavaScript; TYPE = "text/javascript"; } 2012-10-28 13:58:47.645 Purdue Course Finder[32890:c07] Current end element: SCRIPT 2012-10-28 13:58:47.645 Purdue Course Finder[32890:c07] Current start element: SCRIPT 2012-10-28 13:58:47.646 Purdue Course Finder[32890:c07] Current attr:{ LANGUAGE = JavaScript; TYPE = "text/javascript"; } 2012-10-28 13:58:47.646 Purdue Course Finder[32890:c07] Current end element: SCRIPT
我什至尝试在- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
方法中到处打印。它在任何地方都找不到这些标签。我想知道是否有人可以帮助我解析此构造不良的XML文件。谢谢!