java - 使用 JSoup 合并 Java 中的 HTML 文件

标签 java html for-loop jsoup bufferedwriter

我正在尝试合并几个 .html文件合二为一.html使用 Jsoup 文件。我的想法是获取 .html 的列表文件在 dir并将名称存储在 ArrayList 中。然后我会 loop通过ArrayList ,将每个文件名作为字符串传递给 Jsoup.parse() 方法。

我能够填充 ArrayList没有问题,我的代码一次适用于一个文件,但是当我添加到 for loops 时下面,NEW_INFORMATION.html文件已创建,但没有任何内容填充。关于我缺少什么有什么想法吗?

这是当前代码:

public class mergeFiles {

    public static void main(String[] args) throws IOException {

        File outputFile = new File ("C:\\Users\\1234\\Desktop\\PowerShellOutput\\NEW_INFORMATION.html");
        File dir = new File ("C:\\Users\\1234\\Desktop\\PowerShellOutput\\");
        File [] paths;
        //Only capture files with extension .html
        FilenameFilter fileNameFilter = new FilenameFilter(){
            public boolean accept(File dir, String name) {
                // TODO Auto-generated method stub
                if (name.lastIndexOf('.') > 0) {
                    int lastIndex = name.lastIndexOf('.');
                    String extension = name.substring(lastIndex);
                    if(extension.equals(".html")){
                        return true;
                    }
                }
                return false;
            }
        };      
        paths = dir.listFiles(fileNameFilter);
        List<String> list = new ArrayList<String>();
        for (File x : paths){
            list.add(x.toString());
        }
        System.out.print(list);
        for (String s : list){
            File input = new File(s);
            Document doc = Jsoup.parse(input, "UTF-8"); 
            Elements links = doc.select("table");
            @SuppressWarnings("resource")
            BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new       FileOutputStream(outputFile), "UTF-8"));
            bw.append("<h2>" + s.toString() + "<h2>");
            bw.append(links.toString());
        }
    }
}

我也尝试了这个变体,没有将路径转换为字符串(相同的结果):

for (File x : paths){
        Document doc = Jsoup.parse(x, "UTF-8"); 
        Elements links = doc.select("table");
        @SuppressWarnings("resource")
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), "UTF-8"));
        bw.append("<h2>" + x.toString() + "<h2>");
        bw.append(links.toString());
    }

为将来可能想要这样的人提供完整的答案:

package htmlMerge;

import java.io.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.Elements;

public class mergeFiles {

public static void main(String[] args) throws IOException {

    try {
        String outFileName = System.getProperty("user.home") + "/Desktop/<Insert The Directory/name.html>";
        File outputFile = new File(outFileName);
        String desktopDir = System.getProperty("user.home") + "/Desktop/<Insert Dir name>";
        File dir = new File(desktopDir);
        File[] paths;
        //create a file filter that will only worry about .html files if your folder contains other extensions
        FilenameFilter fileNameFilter = new FilenameFilter() {
            public boolean accept(File dir, String name) {
                if (name.lastIndexOf('.') > 0) {
                    int lastIndex = name.lastIndexOf('.');
                    String extension = name.substring(lastIndex);
                    if (extension.equals(".html")) {
                        return true;
                    }
                }
                return false;
            }
        };
        paths = dir.listFiles(fileNameFilter);
        //use BufferedWriterd to create the initial .html file with a header
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(
                new FileOutputStream(outputFile), "UTF-8"));
        bw.write("<h1>REPORT DATA</h1>");
        bw.close();
        /*Use file writer to append the .html file with additional .html files
        In this case, the .html files all contain One 'table', so this
        will append the tables to 'outputFile'.*/
        try {
            String file = outputFile.getAbsolutePath();
            FileWriter fw = new FileWriter(file, true);
            for (File x : paths) {
                Document doc = Jsoup.parse(x, "UTF-8");
                Elements links = doc.select("table");
                //adds the filename of the .html as a Level 2 heading
                fw.write("<h2>" + x.toString() + "</h2>");
                fw.write(links.toString());
            }
            fw.close();
        }catch (IOException ioe) {
            System.err.println(ioe.getMessage());
        } finally {
            bw.close();
        }
    } catch (IOException ioe) {
        System.out.println(ioe.getMessage());
    }
    System.out.println("\nMerge Completed Successfully");
  }
}

最佳答案

您必须关闭BufferedWriter才能看到更改。

关于java - 使用 JSoup 合并 Java 中的 HTML 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25195397/

相关文章:

javascript - express.static 仅加载到 index.html

java - 如何在 Java 中使用 iText 从 PDF 文件中删除页眉和页脚

java - 如何将多个菜单部分添加到工具栏?

javascript - 在 D3.JS 中创建元素 block 的最佳方法是什么

javascript - 谷歌地图可以有透明背景吗?

C语言: Write a program that would take 30 integers and prints the largest number and the smallest number

for-loop - 如果我将 golang 中的增量相乘,for 循环不会递增

java - 实例化ResponseException以进行测试时,未处理的IOException

java - 使用 TomEE 并打开 JPA,我收到以下错误 : SEVERE: JAVA AGENT NOT INSTALLED

azure - 需要在 terraform 中包含值或使​​用多个循环来存储变量