php curl DOM,如何提取带有样式的内容

标签 php jquery html css curl

标题可能不清楚。我想在这里实现的是复制现有网页(不属于我)中特定 div 中的所有内容。现在代码可以成功提取内容。

提取器代码:

    // Get Data
    $curl_handle=curl_init();
    curl_setopt($curl_handle, CURLOPT_URL,'http://au.creative.com/p/speakers/creative-t4-wireless');
    curl_setopt($curl_handle, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
    curl_setopt($curl_handle, CURLOPT_POST, false);
    curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
    curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl_handle, CURLOPT_HEADER, 0);
    curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0) Gecko/25250101');
    //$html = curl_exec($curl_handle);
    $html = file_get_html('http://au.creative.com/p/speakers/creative-t4-wireless');
    curl_close($curl_handle);


    //Display required part
    $xml = new DomDocument;
    @$xml->loadHTML($html);
    $xpath = new DomXpath($xml);
    $info = $xpath->query('//div[@class="wrapper features-contents"]')->item(0);
    echo utf8_decode($xml->saveXML($info));
    echo '<textarea rows="500" cols="100">' . $xml->saveXML($info) .'</textarea>';

提取代码:

<h3 class="feature-header">Pair and connect in so many ways</h3> 
<div class="row product-info-row"> 
<div class="span12"> 
<div id="slides-modes-21677" style="position:relative;">
<a id="arrow-left-21677" class="slidesjs-previous slidesjs-navigation" href="#">
<img src="//d287ku8w5owj51.cloudfront.net/inline/products/21430/arrow_left.jpg" border="0" alt="<" width="42" height="54"/></a> <div id="slide1">
<img style="margin:0 20px 0 20px;" src="//d287ku8w5owj51.cloudfront.net/inline/products/21677/bluetooth.jpg.ashx?width=520&height=383" alt="Freedom without compromise" width="520" height="383" align="right"/>

很明显只提取了类名。我记得当你从 chrome 复制网页内容并粘贴到 firefox 时。 css 是 transform info inline 样式。我可以在 php 中完成吗?

我在firefox中获取的部分网页内容:

    <h3 class="feature-header" style="font-size: 2.2857em; margin: 20px 0px 30px; font-family: proxima-nova, Helvetica, Arial, sans-serif; line-height: 1.4; text-transform: uppercase; color: #666666; font-style: normal; font-variant: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">PAIR AND CONNECT IN SO MANY WAYS</h3>
    <div class="row product-info-row" style="margin-bottom: 60px; margin-left: -20px; color: #666666; font-family: proxima-nova, Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 21px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
    <div class="span12" style="float: left; min-height: 1px; margin-left: 20px; width: 940px;">
    <div id="slides-modes-21677" style="position: relative; overflow: hidden;">
    <div class="slidesjs-container" style="overflow: hidden; position: relative; width: 940px; height: 383px;">
    <div class="slidesjs-control" style="position: relative; left: 0px; width: 940px; height: 383px;">
    <div id="slide1" class="slidesjs-slide" style="position: absolute; top: 0px; left: 0px; width: 940px; z-index: 10; -webkit-backface-visibility: hidden;"><img style="border: 0px; vertical-align: middle; margin: 0px 20px;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21677/bluetooth.jpg.ashx?width=520&amp;height=383" alt="Freedom without compromise" width="520" height="383" align="right" />
    <h3 class="feature-subheader" style="font-size: 1.7142em; margin: 30px 0px 0.8em; font-family: proxima-nova, Helvetica, Arial, sans-serif; line-height: 1.25em; color: #252525; font-weight: normal;">Freedom without compromise</h3>
    <p style="margin: 0px 0px 1em;"><em>Bluetooth</em><span class="Apple-converted-space">&nbsp;</span>wireless connectivity gives you the freedom and convenience to move around your room with your smart device as you're not tied down by any wires.<sup style="line-height: 0; position: relative; vertical-align: baseline; top: -0.5em;">1</sup><span class="Apple-converted-space">&nbsp;</span>And with aptX, you're assured of uncompromised audio quality.</p>
    </div>
    <div id="slide2" class="slidesjs-slide" style="position: absolute; top: 0px; left: 940px; width: 940px; z-index: 0; display: block; -webkit-backface-visibility: hidden;">
    <div style="margin: 0px 20px; vertical-align: middle; float: left;"><img id="fea_nfc_2" style="border: 0px; vertical-align: middle;" src="http://img.creative.com/inline/products/21677/fea_nfc_2.jpg" alt="" /></div>
    <h3 class="feature-subheader" style="font-size: 1.7142em; margin: 30px 0px 0.8em; font-family: proxima-nova, Helvetica, Arial, sans-serif; line-height: 1.25em; color: #252525; font-weight: normal;">Just tap and pair</h3>
    <p style="margin: 0px 0px 1em;">With the NFC (Near Field Communication) receptor on the Audio Control Pod, you can simply tap your NFC-enabled device on it to pair and then you're all set to stream and enjoy your music.</p>
    </div>
    <div id="slide3" class="slidesjs-slide" style="position: absolute; top: 0px; left: -940px; width: 940px; z-index: 0; display: block; -webkit-backface-visibility: hidden;"><img style="border: 0px; vertical-align: middle; margin: 0px 20px;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21677/multipoint.png.ashx?width=520&amp;height=383" alt="Stay connected" width="520" height="383" align="right" />
    <h3 class="feature-subheader" style="font-size: 1.7142em; margin: 30px 0px 0.8em; font-family: proxima-nova, Helvetica, Arial, sans-serif; line-height: 1.25em; color: #252525; font-weight: normal;">Stay connected</h3>
    <p style="margin: 0px 0px 1em;">Connect with multiple<span class="Apple-converted-space">&nbsp;</span><em>Bluetooth</em><span class="Apple-converted-space">&nbsp;</span>devices! With Creative Multipoint, you can have two<span class="Apple-converted-space">&nbsp;</span><em>Bluetooth</em><span class="Apple-converted-space">&nbsp;</span>stereo devices paired to the speakers at any one time and easily toggle between them.<sup style="line-height: 0; position: relative; vertical-align: baseline; top: -0.5em;">2</sup></p>
    </div>
    </div>
    </div>
    <a id="arrow-right-21677" class="slidesjs-next slidesjs-navigation" style="color: #0cbdef; text-decoration: none; cursor: pointer; display: block; overflow: hidden; position: absolute; top: 164.5px; z-index: 30; right: 0px;" href="http://au.creative.com/p/speakers/creative-t4-wireless#"><img style="border: 0px; vertical-align: middle;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21430/arrow_right.jpg" alt="&lt;" width="42" height="54" border="0" /></a></div>
    </div>
    </div>
    <div class="row" style="margin-left: -20px; color: #666666; font-family: proxima-nova, Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 21px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
    <div class="span6" style="float: left; min-height: 1px; margin-left: 20px; width: 460px;">
    <div class="slides-perfect-audio" style="width: 460px; display: block; overflow: hidden;">
    <div class="slidesjs-container" style="overflow: hidden; position: relative; width: 460px; height: 327.8723404255319px;">
    <div class="slidesjs-control" style="position: relative; left: 0px; width: 460px; height: 327.8723404255319px;">
    <div class="slidesjs-slide" style="position: absolute; top: 0px; left: 0px; width: 460px; z-index: 10; -webkit-backface-visibility: hidden;"><img style="border: 0px; vertical-align: middle;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21677/optical.png" alt="Optical input" /></div>
    <div class="slidesjs-slide" style="position: absolute; top: 0px; left: 460px; width: 460px; z-index: 0; display: block; -webkit-backface-visibility: hidden;"><img style="border: 0px; vertical-align: middle;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21677/RCA.png" alt="RCA input" /></div>
    <div class="slidesjs-slide" style="position: absolute; top: 0px; left: -460px; width: 460px; z-index: 0; display: block; -webkit-backface-visibility: hidden;"><img style="border: 0px; vertical-align: middle;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21677/aux_in.png" alt="Aux in" /></div>
    </div>
    </div>
    <ul class="slidesjs-pagination" style="margin: 10px auto; padding: 0px; display: block; width: 60px; list-style: none;">
    <li class="slidesjs-pagination-item" style="display: inline; list-style: none; margin: 0px; padding: 0px;"><a class="active" style="color: #cccccc !important; text-decoration: none; cursor: pointer; padding: 0px; background-color: #999999; font-size: 1px; width: 8px; height: 8px; border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; border: 1px solid #999999; margin-right: 5px; display: inline-block; background-position: 100% 0%;" href="http://au.creative.com/p/speakers/creative-t4-wireless#" data-slidesjs-item="0">1</a></li>
    <li class="slidesjs-pagination-item" style="display: inline; list-style: none; margin: 0px; padding: 0px;"><a style="color: #ffffff; text-decoration: none; cursor: pointer; font-size: 1px; width: 8px; height: 8px; border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; border: 1px solid #999999; background-color: #ffffff; margin-right: 5px; display: inline-block;" href="http://au.creative.com/p/speakers/creative-t4-wireless#" data-slidesjs-item="1">2</a></li>
    <li class="slidesjs-pagination-item" style="display: inline; list-style: none; margin: 0px; padding: 0px;"><a style="color: #ffffff; text-decoration: none; cursor: pointer; font-size: 1px; width: 8px; height: 8px; border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; border: 1px solid #999999; background-color: #ffffff; margin-right: 5px; display: inline-block;" href="http://au.creative.com/p/speakers/creative-t4-wireless#" data-slidesjs-item="2">3</a></li>
    </ul>
    </div>
    </div>
    <div class="span6" style="float: left; min-height: 1px; margin-left: 20px; width: 460px;"><img style="border: 0px; vertical-align: middle;" src="http://d287ku8w5owj51.cloudfront.net/inline/products/21677/playing_games.jpg" alt="Switch to private listening" /></div>
    </div>
    <div class="row product-info-row" style="margin-bottom: 60px; margin-left: -20px; color: #666666; font-family: proxima-nova, Helvetica, Arial, sans-serif; font-size: 14px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 21px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">
    <div class="span6" style="float: left; min-height: 1px; margin-left: 20px; width: 460px;">
    <h4 class="feature-subheader" style="font-size: 1.7142em; margin: 30px 0px 0.8em; font-family: proxima-nova, Helvetica, Arial, sans-serif; line-height: 1.25em; color: #252525; font-weight: normal;">Even more connectivity options</h4>
    <p style="margin: 0px 0px 1em;">The Creative T4 Wireless comes with an optical input for digital signals, so you can directly send audio from sources such as your HD TV or sound cards without loss of resolution. It also has RCA analog inputs for connection to your video console or DVD player, as well as a 3.5mm input for connection to smart devices and portable media players.</p>
    </div>
    <div class="span6" style="float: left; min-height: 1px; margin-left: 20px; width: 460px;">
    <h4 class="feature-subheader" style="font-size: 1.7142em; margin: 30px 0px 0.8em; font-family: proxima-nova, Helvetica, Arial, sans-serif; line-height: 1.25em; color: #252525; font-weight: normal;">Switch to private listening</h4>
    <p style="margin: 0px 0px 1em;">For late-night gaming or movie-watching, there's no need to worry about waking up the household. The Creative T4 Wireless' Audio Control Pod is integrated with a dedicated headphone jack so that you can conveniently plug in your headphones when the need arises.</p>
    </div>
    </div>

最佳答案

为什么不使用 wget 呢?

wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
         www.website.org/tutorials/html/

http://www.linuxjournal.com/content/downloading-entire-web-site-wget

关于php curl DOM,如何提取带有样式的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23168911/

相关文章:

javascript - 是否可以播放/暂停 .html 对象?

javascript - 使用 jQuery 更新网页。为什么这段代码不起作用?

javascript - JQVMap - 将区域设置为禁用/不可选择

javascript - 将文件从一台远程服务器发送到另一台远程服务器

jquery - 正/负最大值输入

javascript - 如何正确使用javascript条件?

php - 使用 PHP 进行实时视频流

php - 不存在的表不会引发错误

php - YouTube数据API返回的结果多于maxResults

PHP 正则表达式 - 捕获两个字符串之间的内容(多个结果)