xpath - 循环抓取同一页面上的多个元素,同时单独存储它们

标签 xpath scrapy

我希望在使用 Scrapy 时从单个页面中抓取多个产品名称

<!-- body_text //-->

    <td width="601" valign="top">

      <table border="0" width="100%" cellspacing="0" cellpadding="0">

        <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

       <tr>

         <td class="pageHeading">Pool (Pocket Billiards) Table</td>

        </tr>

        <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

        <tr>

          <td class="main">A Victoria table is more than mere wood and slate. By paying attention to the details - the hidden differences - Victoria tables have become known name as masterpieces of original design and craftmanship, and most prestigious name in billiards.<br><br>

          

          These tables, available in two sizes  9’ X 4.5’ and 8’ X 4’, are made of frames with selected good quality solid wood and finely crafted rose wood legs with Mahagony polish.<br><br>

Slate Beds used are either Indian Bangalore Black Slate or Imported Slate. Slates are covered with worsted wool cloth optionally from Jupiter (China) or Strachan (West of England cloth, U.K.) to have proper speed, accuracy and responsiveness of the table to spin. Chrome nuts and adjusters  are used for leveling. It is surrounded with standard imported vulcanized 'L' shaped or 'V' shaped rubber cushions or Northern Cushions (Made in England) to cause billiard balls to rebound while minimizing the lose of kinetic energy.</td>

        </tr>

        

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs20b"></a>VS-20B</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.B. Frame</li><li><strong>Bangalore Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-20bbig.jpg')"><img src="images/products/vs-20b.jpg" alt="VS-20B" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs20b"></a>VS-20C</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 8&lsquo; X 4&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.B. Frame</li><li><strong>Bangalore Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-20cbig.jpg')"><img src="images/products/vs-20c.jpg" alt="VS-20C" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs23b"></a>VS-23B</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.A.L. Frame</li><li><strong>Imported Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-23bbig.jpg')"><img src="images/products/vs-23b.jpg" alt="VS-23B" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs23b"></a>VS-23C</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 8&lsquo; X 4&lsquo;</strong></li><li>Rose Wood Legs</li><li>Mahgony Polish</li><li>S.A.L. Frame</li><li><strong>Imported Slate</strong></li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-23cbig.jpg')"><img src="images/products/vs-23c.jpg" alt="VS-23C" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs9"></a>VS-9</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>Size: 9&lsquo; X 4.5&lsquo;</strong></li><li>Auto Ball Return System</li><li>Pro Speed Cloth</li><li>American Pocket Size</li><li>Standard Accessories</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-9big.jpg')"><img src="images/products/vs-9.jpg" alt="VS-9" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs7"></a>VS-7</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98"L X 54" W X 31" H</strong></li><li>Solid oak for top/brand rails, Dark cherry finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket.  Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-7big.jpg')"><img src="images/products/vs-7.jpg" alt="VS-7" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs8"></a>VS-8/Light Oak</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98" X 54"W X 31"H</strong></li><li>Solid oak for top/brand rails, Light oak finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket, Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-8big.jpg')"><img src="images/products/vs-8.jpg" alt="VS-8/Light Oak" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs12"></a>VS-12</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 99-3/4"L X 55 - 3/4" W X 31" H</strong></li><li>Black laminate, pedestal legs, with drop pocket, Steel frame Easy assembly. Accessories included.</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-12big.jpg')"><img src="images/products/vs-12.jpg" alt="VS-12" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs10"></a>VS-10</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 98" L X 54"W X 31"H</strong></li><li>Solid oak for top/brand rails, oak finish</li><li>Rams head solid rubber wood with # 6 leather drop pocket, Easy assembly</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-10big.jpg')"><img src="images/products/vs-10.jpg" alt="VS-10" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs11"></a>VS-11</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 100" X 56"</strong></li><li>Solid wood for top/brand rails</li><li>Mahogany finish</li><li>Rams head solid rubber with # 6 leather drop pocket</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-11big.jpg')"><img src="images/products/vs-11.jpg" alt="VS-11" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          

            <tr>

              <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

            </tr>

            <tr>

              <td>

                <table cellpadding="4" cellspacing="0" width="100%" border="0" class="product_box">

                  <tr>

                    <td width="50%" valign="top" class="product_name" colspan="2"><strong><a name="vs13"></a>VS-13</strong></td>

                  </tr>

                </table>

                <table cellpadding="4" cellspacing="4" width="100%" border="0" >

                  <tr>

                    <td width="60%" valign="top" class="product_text"><ul><li><strong>POOL TABLE : 8&lsquo; X 4&lsquo;</strong></li><li><strong>PLAYING AREA : 88" X 44"</strong></li><li><strong>OVERALL SIZE : 100" X 56"</strong></li><li>Solid wood for top/brand rails,</li><li>Dark cherry finish</li><li>Rams head solid rubber wood<br />
<br />
with # 6 leather drop pocket</li></ul></td>

                    <td width="40%" align="center"><a href="javascript:popupWindow('images/products/vs-13big.jpg')"><img src="images/products/vs-13.jpg" alt="VS-13" border="0" width="250px"></a></td>

                  </tr>

                </table>

              </td>                 

            </tr>

          
            <tr>

          <td><img src="images/pixel_trans.gif" border="0" alt="" width="100%" height="10"></td>

        </tr>

        <tr>

          <td>

            <table cellpadding="4" cellspacing="0" width="100%" border="0">

              <tr>

                <td width="50%" valign="top" class="product_name1" colspan="2"><strong>Standard Accessories for Pool</strong></td>

              </tr>

            </table>

            <table cellpadding="4" cellspacing="4" width="100%" border="0" class="product_box1">

              <tr>

                <td width="50%" valign="top" class="product_text">

                <ul>

                  <li>Aramith Pool Ball 2.1/4" or 2.1/16"</li>

                  <li>Table Brush</li>

                  <li>60" Rest Stick C/W Brass Cross Head Rest</li>

                  <li>Wall Cue Rack</li>

                </ul></td>

                <td width="50%" valign="top" class="product_text">

                <ul>

                  <li>Plastic Triangle</li>

                  <li>Triangle Chalk X 12 Pcs.</li>

                  <li>Pool House Cue X 4 Pcs.</li>

                  <li>Table Cover</li>

                  <li>Round Type Lamp Shade X 2 Pcs.</li>

                </ul></td>

              </tr>

            </table>

          </td>                 

        </tr>

    </table></td>

<!-- body_text_eof //-->

     <td width="45" valign="top">

      <table border="0" width="45" cellspacing="0" cellpadding="0">

<!-- right_navigation //-->

正如您从代码中看到的,我想要抓取的字段位于 xpath:td[@class='product_name']/strong/a/@名称

我还需要从此 xpath 中提取图像:rd[@align='center']/a/img/@src

我正在以 CSV 格式导出数据,目前我的抓取工具将所有产品名称存储在一个单元格中。我正在尝试将每个产品名称和图像 URL 分别存储在 CSV 的单个单元格中。

我尝试使用循环来实现此目的,但无法使其工作
我的代码:

  def parse(self, response):
   hxs = HtmlXPathSelector(response)  
   titles = hxs.select("//head")
   items = []
   item = item()
   
   for i in range(0,5):
     
     item ["productname"] = titles.select("//td[@class='product_name'][i]/strong").extract()
     item ["imgurl"] = titles.select("//td[@align='center'][i]/a/img/@src").extract()
     
     
     items.append(item)
     return(items)

最佳答案

names = hxs.xpath('//td[@class="product_name"]/strong/text()')
imageurls = hxs.xpath('//tr/td[@align="center"]/a/img/@src')
for name, url in zip(names, imageurls):
    item["productname"] = name
    item["imgurl"] = url
    yield item

最简单的方法,因为名称和图像 URL 的顺序在提取时会相互对应。

关于xpath - 循环抓取同一页面上的多个元素,同时单独存储它们,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24183258/

相关文章:

java - 如何等待页面加载后再执行下一步操作?

python - Scrapy 错误 - HTTP 状态代码未处理或不允许

python - Scrapy start_urls 未解析

c# - 当 XML 再次包含相同的元素时,如何使用 C# 解析 XML?

javascript - 我想要跨度和类的正确 xpath

python - 使用scrapy从阿里巴巴抓取标题

python - Scrapy Xpath 根据容器中带有文本的标签获取文本

python - 使用 Scrapy 和 Xpath 检索完整 url

xml - XPath查询不匹配任何值

php - 使用xpath获取href