php - 解析网站的 URL

只是想知道是否有人可以在以下方面进一步帮助我。我想解析这个网站的网址:http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr

我有以下代码:

<?PHP  
$url = "http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr";
$input = @file_get_contents($url) or die("Could not access file: $url"); 
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; 
if(preg_match_all("/$regexp/siU", $input, $matches)) { 
// $matches[2] = array of link addresses 
// $matches[3] = array of link text - including HTML code
} 
?>

目前什么都不做，我需要做的是废弃所有 16 页表中的所有 URL，非常感谢您提供有关如何修改上述内容并将 URL 输出到文本文件的帮助。

最佳答案

使用HTML Dom Parser

$html = file_get_html('http://www.example.com/');

// Find all links
$links = array(); 
foreach($html->find('a') as $element) 
       $links[] = $element->href;

现在 links 数组包含给定页面的所有 URL，您可以使用这些 URL 进一步解析。

用正则表达式解析 HTML 不是一个好主意。以下是一些相关帖子:

编辑:

一些其他 HTML 解析工具，如 Gordon 所述在下面的评论中:

关于php - 解析网站的 URL，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4461105/

上一篇：html - 如何使用 div 对齐表格之类的表格？

下一篇：html - 使用带参数的 CSS 样式表 url

相关文章：

javascript - 呈现页面后在 HTML 中拆分文本名称字符串

c# - 在 C# 中解析嵌套的 JSON

c - 在 CDT 之外使用 Eclipse CDT 解析器

php删除数据库中的值，其他不变

php - 有没有办法获取变量的名称？ PHP-反射

php - MySQL 查询缺少第一条记录后的 foreach() 循环

PHP/HTML/CSS 图像轮播在 Wordpress 首页上的特色

php - 使用 PHP Phar 加密

html - 当我切换移动菜单时，标题下方的所有网站内容都被下推。任何想法如何解决这个问题？

PHP - 解析由 simplexml_load_string() 创建的 XML 对象