如何编写 Emacs Lisp 函数来查找 HTML 文件中的所有 href 并提取所有链接?
输入:
<html> <a href="http://www.stackoverflow.com" _target="_blank">StackOverFlow</a> <h1>Emacs Lisp</h1> <a href="http://news.ycombinator.com" _target="_blank">Hacker News</a> </html>
输出:
http://www.stackoverflow.com|StackOverFlow http://news.ycombinator.com|Hacker News
在搜索过程中,我曾多次看到重新搜索转发功能。根据我目前所读到的内容,这是我认为我需要做的事情。
(defun extra-urls (file) ... (setq buffer (... (while (re-search-forward "http://" nil t) (when (match-string 0) ... ))
最佳答案
我采用了 Heinzi 的解决方案,并提出了我需要的最终解决方案。我现在可以获取文件列表,提取所有 URL 和标题,并将结果放在一个输出缓冲区中。
(defun extract-urls (fname) "Extract HTML href url's,titles to buffer 'new-urls.csv' in | separated format." (setq in-buf (set-buffer (find-file fname))); Save for clean up (beginning-of-buffer); Need to do this in case the buffer is already open (setq u1 '()) (while (re-search-forward "^.*<a href=\"\\([^\"]+\\)\"[^>]+>\\([^<]+\\)</a>" nil t) (when (match-string 0) ; Got a match (setq url (match-string 1) ) ; URL (setq title (match-string 2) ) ; Title (setq u1 (cons (concat url "|" title "\n") u1)) ; Build the list of URLs ) ) (kill-buffer in-buf) ; Don't leave a mess of buffers (progn (with-current-buffer (get-buffer-create "new-urls.csv"); Send results to new buffer (mapcar 'insert u1)) (switch-to-buffer "new-urls.csv"); Finally, show the new buffer ) ) ;; Create a list of files to process ;; (mapcar 'extract-urls '( "/tmp/foo.html" "/tmp/bar.html" ))
关于elisp - 从 Emacs 缓冲区中提取 URL?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1642184/