java爬取验证码图片_JAVA HttpClient实现页面信息抓取(获取图片验证码并传入cookie实现信息获取)...
JAVA HttpClient實現頁面信息抓取(獲取圖片驗證碼并傳入cookie實現信息獲取)
發布時間:2018-05-18 16:41,
瀏覽次數:632
, 標簽:
JAVA
HttpClient
cookie
有時候我們的程序中需要調用第三方接口獲取數據,比如在這里需要在我的程序里實現用戶輸入汽車號牌等信息就可以查到用戶的違章信息,在沒有其他方法的情況下我就得想辦法在官網獲取信息。上圖是官網獲取信息的網站頁面。
傳統的ajax請求不可能實現,光不能跨域這一點就實現不了。
使用java的post請求可以實現,在一般這種網站安全保護較高,不會讓你隨意訪問,一般的做法就是進入網站首頁時request中返回cookie,以后的每次操作后臺都會比對你傳入的cookie是否相同,相同則認為你是從瀏覽器首頁點進來的,正常顯示。請求信息的cookie不同或沒有cookie,則認為是機器程序訪問,禁止訪問。
要想獲取首頁的cookie,我們需要先用java請求首頁,獲取cookie,以后的每次請求都將此cookie set進去即可。
因為要請求數據還有要輸入圖片驗證碼這一步。網站傳輸的圖片驗證碼一般都是返回文件流,設置到img
的src屬性里面。所以我們需要先獲取這個圖片驗證碼返回給前臺。用戶看到驗證碼可以填寫驗證碼和其他信息提交。我再用java請求官網并將參數一并傳入,當然不要忘了傳cookie,這樣就可以順利獲取官網返回的違章信息了。
具體代碼如下:
引入httpclient相關的包
import net.sf.json.JSONArray; import net.sf.json.JSONObject; import
org.apache.commons.collections.map.ListOrderedMap; import
org.apache.commons.httpclient.Cookie; import
org.apache.commons.httpclient.HttpClient; import
org.apache.commons.httpclient.HttpException; import
org.apache.commons.httpclient.cookie.CookiePolicy; import
org.apache.commons.httpclient.methods.GetMethod; import
org.apache.commons.httpclient.methods.PostMethod; import
org.apache.commons.httpclient.params.HttpMethodParams; import
org.apache.commons.lang.StringUtils; //違章查詢官網頁面抓取圖片驗證碼(第一步) public void
getImage(){ HttpServletRequest request = ServletActionContext.getRequest();
HttpServletResponse response = ServletActionContext.getResponse();
//初始化httpclient HttpClient httpClient = new HttpClient();
//首先把進入山西交警官網的首頁得到cookie(里面會包括token和sessionid等); String url1 =
"http://sx.122.gov.cn/views/inquiry.html?q=j"; GetMethod getMethod1 = new
GetMethod(url1);
httpClient.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); try
{ //執行訪問頁面 int statusCode=httpClient.executeMethod(getMethod1); } catch
(HttpException e) { // TODO Auto-generated catch block e.printStackTrace(); }
catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace();
} // 獲得登陸后的 Cookie Cookie[] cookies = httpClient.getState().getCookies();
StringBuffer tmpcookies = new StringBuffer(); for (Cookie c : cookies) {
tmpcookies.append(c.toString() + ";"); } //這里吧cookie存在本地session里,供查詢提交數據用
request.getSession().setAttribute("cookie", tmpcookies.toString());
//給路徑加后綴,避免相同路徑被緩存不再請求 String url2 = "http://sx.122.gov.cn/captcha?t="+new
Date().getTime(); GetMethod getMethod2 = new GetMethod(url2); try { int
statusCode2=httpClient.executeMethod(getMethod2); } catch (HttpException e) {
// TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e)
{ // TODO Auto-generated catch block e.printStackTrace(); } //獲取請求到的數據 byte[]
responseBody = null; try { responseBody = getMethod2.getResponseBody(); } catch
(IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }
//將請求的驗證碼圖片用輸出流方式輸出 try { OutputStream out = response.getOutputStream();
response.setContentType("image/jpeg");
response.setHeader("Content-Type","image/jpeg"); out.write(responseBody);
out.flush(); out.close(); } catch (IOException e) { // TODO Auto-generated
catch block e.printStackTrace(); } }
關于前臺圖片驗證碼的顯示和刷新
驗證碼:style="width:100px"; maxlength="4" required="required"/>
class="yanzhengblock" style="width:130px;height:30px;border:1px solid #ddd;
position:absolute;right:10px;top:8px;background:white;overflow:hidden;">
class="yanzhengimg" src="getImage.action"
style="width:130px;height:30px;margin-left:-2px;margin-top:-2px;"
οnclick="changeimg(this)">
最后傳入參數請求數據
//給山西交警網提交數據(頁面抓取)違章查詢 public void trafficWeb() throws HttpException,
IOException{ HttpServletRequest request = ServletActionContext.getRequest();
HttpServletResponse response = ServletActionContext.getResponse();
//前面已經獲取到了交警網的token和sessionid(cookie)這里開始提交數據,用postMethod; HttpClient
httpClient = new HttpClient(); String posturl =
"http://sx.122.gov.cn/m/publicquery/vio"; PostMethod postMethod = new
PostMethod(posturl); //獲取提取驗證碼時得到的cookie; String cookie=(String)
request.getSession().getAttribute("cookie");
postMethod.setRequestHeader("Cookie", cookie+"userpub=1;"); //
referer指當前頁面從哪里來的,頁面為了限制機器操作的方法一般為cookie,referer和驗證碼; //設置一些header
postMethod.setRequestHeader("Accept", "application/json, text/javascript, */*;
q=0.01"); postMethod.setRequestHeader("Accept-Encoding", "gzip, deflate");
postMethod.setRequestHeader("Accept-Language", "zh-CN,zh;q=0.8");
postMethod.setRequestHeader("Connection", "keep-alive");
postMethod.setRequestHeader("Content-Type", "application/x-www-form-urlencoded;
charset=UTF-8"); postMethod.setRequestHeader("Host", "sx.122.gov.cn");
postMethod.setRequestHeader("Origin", "http://sx.122.gov.cn");
postMethod.setRequestHeader("Referer",
"http://sx.122.gov.cn/views/inquiry.html");
postMethod.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36");
postMethod.setRequestHeader("X-Requested-With", "XMLHttpRequest"); //頁面表單填寫的參數
String hpzl=(String) request.getAttribute("hpzl"); String hphm1b=(String)
request.getAttribute("hphm1b"); String hphm=(String)
request.getAttribute("hphm"); String fdjh=(String)
request.getAttribute("fdjh"); String qm=(String) request.getAttribute("qm");
String captcha=(String) request.getAttribute("captcha"); String page1=(String)
request.getAttribute("page1"); //保存車輛信息 member=(TMember)
request.getSession().getAttribute("MEMBER"); String hqlWhere = ""; hqlWhere+="
and carNumber = '"+hphm.trim()+"'"; cars=memberService.findCars(hqlWhere,page);
if(cars!=null&&cars.size()>0){ car=cars.get(0); car.setCarNumber(hphm);
car.setCarMember(member.getMemberCode()); car.setCarEngine(fdjh);
car.setCarPlateType(hpzl); memberService.updateCar(car); }else{ car = new
TCar(); car.setCarNumber(hphm); car.setCarMember(member.getMemberCode());
car.setCarEngine(fdjh); car.setCarPlateType(hpzl); memberService.saveCar(car);
} //把官網需要提交的參數添加 postMethod.addParameter("hpzl", hpzl);
postMethod.addParameter("hphm1b", hphm1b); postMethod.addParameter("hphm",
hphm); postMethod.addParameter("fdjh", fdjh); postMethod.addParameter("qm",
qm); postMethod.addParameter("captcha", captcha);
postMethod.addParameter("page", page1); //postMethod.getRequestHeaders();
//執行提交方法 int bb=httpClient.executeMethod(postMethod); //開始得到網站返回值 byte[]
responseBody = null; try { responseBody = postMethod.getResponseBody(); } catch
(IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }
//轉成字符串并以json格式返回頁面 String result=new String(responseBody, "UTF-8");
System.out.println(result); JSONObject jsonObj = new JSONObject();
jsonObj.put("data", JSONObject.fromObject(result)); //設置response輸出 PrintWriter
out = null; response.setCharacterEncoding("UTF-8");
response.setContentType("application/json;charset=UTF-8"); try { out =
response.getWriter(); out.print(jsonObj); } catch (IOException e) { // TODO
Auto-generated catch block e.printStackTrace(); } finally { if (out != null) {
out.flush(); out.close(); } } request.setAttribute("weizhangresult", jsonObj); }
總結
以上是生活随笔為你收集整理的java爬取验证码图片_JAVA HttpClient实现页面信息抓取(获取图片验证码并传入cookie实现信息获取)...的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 荒野行动更新到多久???
- 下一篇: java读取文本单词_使用Java计算文