java - 读取网站的 HTML 代码

标签 java android web-scraping screen-scraping

我正在尝试读取网站的 HTML 代码,因此我使用以下代码: 我的 fragment 之一:

public class FragmentFavorites extends Fragment {
    View view;
    TextView text;
    Homescreen home = new Homescreen();
    public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
        view = inflater.inflate(R.layout.favorites,container, false);
        text = (TextView) view.findViewById(R.id.textView2);
        try {
            text.setText(home.getHtml("http://pastebin.com/u7jHeNwf"));
        } catch (IOException e) {
            e.printStackTrace();
        }
        return view;
    }
}

这是我指的 getHtml():

public static String getHtml(String url) throws IOException {
        URLConnection connection = (new URL(url)).openConnection();
        connection.setConnectTimeout(5000);
        connection.setReadTimeout(5000);
        connection.connect();

        InputStream in = connection.getInputStream();
        BufferedReader reader = new BufferedReader(new InputStreamReader(in));
        StringBuilder html = new StringBuilder();
        for (String line; (line = reader.readLine()) != null; ) {
            html.append(line);
        }
        in.close();

        return html.toString();
    }

不幸的是,每次我滚动到此 fragment/调用 getHTML 时,我的应用程序都会停止运行。有人知道我做错了什么吗?

最佳答案

    public class FragmentFavorites extends Fragment {
        View view;
        TextView text;
        Homescreen home = new Homescreen();
        public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) {
            view = inflater.inflate(R.layout.favorites,container, false);
            text = (TextView) view.findViewById(R.id.textView2);
            FetchHtml fetchHtml = new FetchHtml(getActivity().getApplicationContext(), FragmentFavorites.this);
            fetchHtml.executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR, "http://pastebin.com/u7jHeNwf");
            return view;
        }

    public static class FetchHtml extends AsyncTask<String, Void, String> {

            Context mContext;
            WeakReference<FragmentFavorites> mClient;

            public RegisterGcmTask(Context context, FragmentFavorites client) {
                this.mContext = context;
                this.mClient = new WeakReference<>(client);
            }

            @Override
            protected String doInBackground(String... params) {
                  try {
                   return getHtml(params[0]);
                } catch (IOException e) {
                   e.printStackTrace();
                   return null;
                }
            }

            @Override
            protected void onPostExecute(String html) {
                super.onPostExecute(token);
                if (null != mClient && null != mClient.get()) {
                    if (null != html) {
                        mClient.get().text.setText(html);
                    } else {
                       mClient.get().text.setText("Error fetching html");
                    }
                }
            }

        private static String getHtml(String url) throws IOException {
            URLConnection connection = (new URL(url)).openConnection();
            connection.setConnectTimeout(5000);
            connection.setReadTimeout(5000);
            connection.connect();

            InputStream in = connection.getInputStream();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder html = new StringBuilder();
            for (String line; (line = reader.readLine()) != null; ) {
                html.append(line);
            }
            in.close();

            return html.toString();
        }

    }
}

关于java - 读取网站的 HTML 代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35022571/

相关文章:

Mono.Cecil 或 .NET 反射的 Java 等效项,可访问 MSIL 代码

java - 克隆 java LinkedHashMap

java - 最大公约数 - 前置条件和后置条件

android - 创建 Android 应用 Release模式

android - 如何在 Android Studio 中排除重命名的某些术语

python - 当 HTML 没有改变时,如何使用 Python 进行网页抓取?

python - 从网页上抓取 2 个不同格式的表格 - Beautiful Soup

java - 如何在不同方法调用之间维护内存中的对象列表?

java - 如何删除插入到 TextView 的附加文本

ruby - 如何正确使用 tor-privoxy Ruby gem?