python - 使用元素树和 Python 进行 XML 命名空间解析搜索

标签 python xml namespaces prefix

我已经在 SO(包括 here )和其他地方进行了搜索,但是当有命名空间前缀时,我仍然坚持尝试从 XML 中提取特定信息。
我正在尝试使用 ElementTree 从下面的“实例文档”中提取 URL。这是包含 URL 的行:

<edgar:xbrlFile edgar:sequence="2" edgar:file="qcom-20090927.xml" edgar:type="EX-101.INS" edgar:size="1479637" edgar:description="EX-101 INSTANCE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927.xml" />

我尝试了许多不同的方法,但是在 .findall 时我总是得到一个空列表。我试过在搜索之前沿着树向下移动,等等。有人可以帮我把这些信息放到一个变量中吗?
非常感谢您的帮助。
伊森
<?xml version="1.0" encoding="windows-1252"?>
<?xml-stylesheet type="text/xsl" href="/rss/styles/shared_xsl_stylesheet_v2.xml"?>
<rss version="2.0">
  <channel>
    <title>All XBRL Data Submitted to the SEC for 2009-12</title>
    <link>http://www.sec.gov/spotlight/xbrl/filings-and-feeds.shtml</link>
    <atom:link href="http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2009-12.xml" rel="self" type="application/rss+xml" xmlns:atom="http://www.w3.org/2005/Atom"/>
    <description>This is a list all of the filings containing XBRL for 2009-12</description>
    <language>en-us</language>
    <pubDate>Tue, 25 Jun 2013 00:00:00 EDT</pubDate>
    <lastBuildDate>Tue, 25 Jun 2013 00:00:00 EDT</lastBuildDate>
    <item>
      <title>QUALCOMM INC/DE (0000804328) (Filer)</title>
      <link>http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/0000950123-09-072780-index.htm</link>
      <guid>http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/0000950123-09-072780-xbrl.zip</guid>
      <enclosure url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/0000950123-09-072780-xbrl.zip" length="126771" type="application/zip" />
      <description>10-K/A</description>
      <pubDate>Tue, 22 Dec 2009 17:23:59 EST</pubDate>
      <edgar:xbrlFiling xmlns:edgar="http://www.sec.gov/Archives/edgar">
        <edgar:companyName>QUALCOMM INC/DE</edgar:companyName>
        <edgar:formType>10-K/A</edgar:formType>
        <edgar:filingDate>12/22/2009</edgar:filingDate>
        <edgar:cikNumber>0000804328</edgar:cikNumber>
        <edgar:accessionNumber>0000950123-09-072780</edgar:accessionNumber>
        <edgar:fileNumber>000-19528</edgar:fileNumber>
        <edgar:acceptanceDatetime>20091222172359</edgar:acceptanceDatetime>
        <edgar:period>20090927</edgar:period>
        <edgar:assistantDirector>11</edgar:assistantDirector>
        <edgar:assignedSic>3663</edgar:assignedSic>
        <edgar:fiscalYearEnd>0930</edgar:fiscalYearEnd>
        <edgar:xbrlFiles>
          <edgar:xbrlFile edgar:sequence="1" edgar:file="a54714e10vkza.htm" edgar:type="10-K/A" edgar:size="19974" edgar:description="10-K/A" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/a54714e10vkza.htm" />
          **<edgar:xbrlFile edgar:sequence="2" edgar:file="qcom-20090927.xml" edgar:type="EX-101.INS" edgar:size="1479637" edgar:description="EX-101 INSTANCE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927.xml" />**
          <edgar:xbrlFile edgar:sequence="3" edgar:file="qcom-20090927.xsd" edgar:type="EX-101.SCH" edgar:size="18628" edgar:description="EX-101 SCHEMA DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927.xsd" />
          <edgar:xbrlFile edgar:sequence="4" edgar:file="qcom-20090927_cal.xml" edgar:type="EX-101.CAL" edgar:size="50670" edgar:description="EX-101 CALCULATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_cal.xml" />
          <edgar:xbrlFile edgar:sequence="5" edgar:file="qcom-20090927_lab.xml" edgar:type="EX-101.LAB" edgar:size="258068" edgar:description="EX-101 LABELS LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_lab.xml" />
          <edgar:xbrlFile edgar:sequence="6" edgar:file="qcom-20090927_pre.xml" edgar:type="EX-101.PRE" edgar:size="133865" edgar:description="EX-101 PRESENTATION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_pre.xml" />
          <edgar:xbrlFile edgar:sequence="7" edgar:file="qcom-20090927_def.xml" edgar:type="EX-101.DEF" edgar:size="21223" edgar:description="EX-101 DEFINITION LINKBASE DOCUMENT" edgar:url="http://www.sec.gov/Archives/edgar/data/804328/000095012309072780/qcom-20090927_def.xml" />
        </edgar:xbrlFiles>
      </edgar:xbrlFiling>
    </item>
    <item>

最佳答案

假设 root 是 ElemenTree 中的根节点。

命名空间是从 'edgar:xbrlFiling' 节点的属性 'xmlns:edgar' 中读取的:

xmlns:edgar="http://www.sec.gov/Archives/edgar"



ElemTree 将 edgar:any_tag 编码为 python 字符串:

ns + 'any_tag'



其中 ns 是下面的 python 字符串:

ns = '{http://www.sec.gov/Archives/edgar}'



因此,要查找所有 xbrlFile 节点,您可以使用以下 XPath 表达式:

xbrlFiles = root.findall('.//'+ns+'xbrlFile')



要获取 URL 属性,您需要提取 ns+'url' 属性(在本例中为第二个文件):
myurl = xbrlFiles[1].attrib[ns + 'url']

关于python - 使用元素树和 Python 进行 XML 命名空间解析搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19911085/

相关文章:

sql - 从sql中的XML中选择全部

routing - 带参数的 Laravel 4 命名空间 Controller 路由

python - 任何人都可以建议如何使用 send_mail 发送电子邮件?它不起作用

python - Pex:无法满足所有要求

PHP:使用 SimpleXML 访问命名空间 XML

c# - AspNetCore Swagger/Swashbuckle如何修改xml schema请求

ruby - Nokogiri:如何使用命名空间前缀获取节点名称

namespaces - 在 Rust 中声明多个 "use"语句是否被认为是不好的风格?

python - OpenCV - 读取 16 位灰度图像

python - 即使在文件名前加上 'r',openpyxl 也无法在 Windows 上保存