java - 使用Java下载文件,没有api,奇怪的错误

标签 java web-scraping

我正在尝试从 this 下载文件站点,通过单击 Excel 图标。通过右键单击该图标,我得到了粘贴到我的 java 程序中的链接,如下所示:

public static void main(String[] args){

    BufferedReader br;
    String thisLine="";
    String file="";
    try {
      // connect and download the file
      ReadableByteChannel rbc;
      file="test.xls";
      URL website = new URL("http://www.anaptyxi.gov.gr/DesktopModules" +
                "/AVMap.ErgaReports_v2/SearchHandler.ashx?lang=el-GR" +
                "&pageMode=3&searchValue=&searchField=&dateFrom=&dateTo=" +
                "&perioxesMode=2&selectedPerioxes[]=01_36_514&ergaType[]=1" +
                "&ergaType[]=2&ergaType[]=3&enisx=&kad=&company=&includePollaplhs=1" +
                "&export=xls");
      rbc = Channels.newChannel(website.openStream());

      FileOutputStream xls = new FileOutputStream(file);  
      xls.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);             
      xls.close();               
    } catch (FileNotFoundException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }   
}

这会创建一个 Excel 文件,但它只包含字符串:

Error executing the request. Try limiting your criteria. Any ideas?

最佳答案

该网站可能正在检查 User-Agent header ,但没有响应,因为您使用的是 java。

这应该可以修复它:

   public static void main(String[] args){

   BufferedReader br;
   String thisLine="";
   String file="";
   try {
       //   connect and download the file
      ReadableByteChannel rbc;
      file="test.xls";
      // connect and download the file
      URL website = new URL("http://www.anaptyxi.gov.gr/DesktopModules" +
                        "/AVMap.ErgaReports_v2/SearchHandler.ashx?lang=el-GR" +
                        "&pageMode=3&searchValue=&searchField=&dateFrom=&dateTo=" +
                        "&perioxesMode=2&selectedPerioxes[]=01_36_514&ergaType[]=1" +
                        "&ergaType[]=2&ergaType[]=3&enisx=&kad=&company=&includePollaplhs=1" +
                        "&export=xls");

      // Adding request headers to mimic the browser
      URLConnection con = website.openConnection();
      con.addRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
      con.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36");
      rbc = Channels.newChannel(con.getInputStream());  // !!!


      FileOutputStream xls = new FileOutputStream(file);  
      xls.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);             
      xls.close();               
    } catch (FileNotFoundException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }   

关于java - 使用Java下载文件,没有api,奇怪的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29882197/

相关文章:

java - 在 Java/JRI 代码中加载 R 自己创建的库时出现问题

javascript - puppeteer:从 NodeList 获取 HTML?

html - 如何从亚马逊拉取产品的图片和标题?

python - 处理下一页链接时遇到问题

java - 当有manytomany注解时Spring data jpa投影重复输出

java - 使用数组或列表在 JSP 中自动完成文本框

java - 返回对象值方法无法正常工作

java - 编码控制台俄语 sumbols 输入

database - Elastic Search 作为持久数据库

Python HTML : Extract Parts of Text from html file