java下载html页面---把网页内容保存成本地html

2024年3月3日 249次阅读来源: 张小凡vip

我们在前面讲到httpclient抓取网页内容的时候通常都是获取到页面的源代码content存入数据库。

详见下文:

那么如果我们除了获得页面源代码之外还想把页面保存到本地存成html应该怎么做呢？

其实很简单我们先来看访问页面获取content的代码

    private static String getUrlContent(DefaultHttpClient httpPostClient,
            String urlString) throws IOException, ClientProtocolException {
        HttpGet httpGet = new HttpGet(urlString);
        HttpResponse httpGetResponse = httpPostClient.execute(httpGet);// 其中HttpGet是HttpUriRequst的子类
        httpPostClient.getParams().setParameter(
                CoreConnectionPNames.CONNECTION_TIMEOUT, 10000);// 连接时间20s
        httpPostClient.getParams().setParameter(
                CoreConnectionPNames.SO_TIMEOUT, 8000);// 数据传输时间60s
        if (httpGetResponse.getStatusLine().getStatusCode() == 200) {
            HttpEntity httpEntity = httpGetResponse.getEntity();
            if (httpEntity.getContentEncoding() != null) {
                if ("gzip".equalsIgnoreCase(httpEntity.getContentEncoding()
                        .getValue())) {
                    httpEntity = new GzipDecompressingEntity(httpEntity);
                } else if ("deflate".equalsIgnoreCase(httpEntity
                        .getContentEncoding().getValue())) {
                    httpEntity = new DeflateDecompressingEntity(httpEntity);
                }
            }
            String result = enCodetoString(httpEntity, encode);// 取出应答字符串
            // System.out.println(result);
            return result;
        }
        return "";
    }

    public static String enCodetoStringDo(final HttpEntity entity,
            Charset defaultCharset) throws IOException, ParseException {
        if (entity == null) {
            throw new IllegalArgumentException("HTTP entity may not be null");
        }
        InputStream instream = entity.getContent();
        if (instream == null) {
            return null;
        }
        try {
            if (entity.getContentLength() > Integer.MAX_VALUE) {
                throw new IllegalArgumentException(
                        "HTTP entity too large to be buffered in memory");
            }
            int i = (int) entity.getContentLength();
            if (i < 0) {
                i = 4096;
            }
            Charset charset = null;
            try {
                // ContentType contentType = ContentType.get(entity);
                // if (contentType != null) {
                // charset = contentType.getCharset();
                // }
            } catch (final UnsupportedCharsetException ex) {
                throw new UnsupportedEncodingException(ex.getMessage());
            }
            if (charset == null) {
                charset = defaultCharset;
            }
            if (charset == null) {
                charset = HTTP.DEF_CONTENT_CHARSET;
            }
            Reader reader = new InputStreamReader(instream, charset);
            CharArrayBuffer buffer = new CharArrayBuffer(i);
            char[] tmp = new char[1024];
            int l;
            while ((l = reader.read(tmp)) != -1) {
                buffer.append(tmp, 0, l);
            }
            return buffer.toString();
        } finally {
            instream.close();
        }
    }

我们得到content之后就可以直接把它存成本地文件就可以了。

我们可以参考

java读写txt

把txt后缀改成html即可

public static void writeToFile(String fileName, String content) {
String time = DATE_FORMAT.format(Calendar.getInstance().getTime());
File dirFile = null;
try {
dirFile = new File(“e:\\” + time);
if (!(dirFile.exists()) && !(dirFile.isDirectory())) {
boolean creadok = dirFile.mkdirs();
if (creadok) {
System.out.println(” ok:创建文件夹成功！ “);
} else {
System.out.println(” err:创建文件夹失败！ “);
}
}
} catch (Exception e) {
e.printStackTrace();
}
String fullPath = dirFile + “/” + fileName + “.txt”;
write(fullPath, content);
}
/**
* 写文件
*
* @param path
* @param content
*/
public static boolean write(String path, String content) {
String s = new String();
String s1 = new String();
BufferedWriter output = null;
try {
File f = new File(path);
if (f.exists()) {
} else {
System.out.println(“文件不存在，正在创建…”);
if (f.createNewFile()) {
System.out.println(“文件创建成功！”);
} else {
System.out.println(“文件创建失败！”);
}
}
BufferedReader input = new BufferedReader(new FileReader(f));
while ((s = input.readLine()) != null) {
s1 += s + “\n”;
}
System.out.println(“原文件内容：” + s1);
input.close();
s1 += content;
output = new BufferedWriter(new FileWriter(f));
output.write(s1);
output.flush();
return true;
} catch (Exception e) {
e.printStackTrace();
return false;
} finally {
if (output != null) {
try {
output.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

    原文作者：张小凡vip
    原文地址: https://blog.csdn.net/q383965374/article/details/44035901
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。