文档章节

HttpClient4基于Shadowsocks-netty的Socks代理

ksfzhaohui
 ksfzhaohui
发布于 2017/10/18 12:27
字数 1131
阅读 288
收藏 8

前言
最近想批量下载一些国外网站的视频,之前写过一个代理程序shadowsocks-netty,打算直接
用它来当作客户端代理程序,而HttpClient4也支持Socks代理;所有准备用HttpClient4来访问国外网站和视频资源

HttpClient4版本

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.3.6</version>
</dependency>

访问网站
设置代理ip和port分别是:localhost和1080
访问国外网站hostname为:www.google.com
具体代码如下:

public class ClientExecuteSOCKS {
 
    /** 代理参数 IP+PORT **/
    private static String PROXY_IP = "localhost";
    private static int PROXY_PORT = 1080;
 
    public static void main(String[] args) throws Exception {
        Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
                .register("http", new MyConnectionSocketFactory()).build();
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(reg);
        CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
        try {
            InetSocketAddress socksaddr = new InetSocketAddress(PROXY_IP, PROXY_PORT);
            HttpClientContext context = HttpClientContext.create();
            context.setAttribute("socks.address", socksaddr);
 
            HttpHost target = new HttpHost("www.google.com", 80, "http");
            HttpGet request = new HttpGet("/");
 
            System.out.println("Executing request " + request + " to " + target + " via SOCKS proxy " + socksaddr);
            CloseableHttpResponse response = httpclient.execute(target, request, context);
            try {
                System.out.println("----------------------------------------");
                System.out.println(response.getStatusLine());
                String htmlStr = EntityUtils.toString(response.getEntity());
                System.out.println(htmlStr);
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }
    }
 
    static class MyConnectionSocketFactory implements ConnectionSocketFactory {
 
        public Socket createSocket(final HttpContext context) throws IOException {
            InetSocketAddress socksaddr = (InetSocketAddress) context.getAttribute("socks.address");
            Proxy proxy = new Proxy(Proxy.Type.SOCKS, socksaddr);
            return new Socket(proxy);
        }
 
        public Socket connectSocket(final int connectTimeout, final Socket socket, final HttpHost host,
                final InetSocketAddress remoteAddress, final InetSocketAddress localAddress, final HttpContext context)
                throws IOException, ConnectTimeoutException {
            Socket sock;
            if (socket != null) {
                sock = socket;
            } else {
                sock = createSocket(context);
            }
            if (localAddress != null) {
                sock.bind(localAddress);
            }
            try {
                sock.connect(remoteAddress, connectTimeout);
            } catch (SocketTimeoutException ex) {
                throw new ConnectTimeoutException(ex, host, remoteAddress.getAddress());
            }
            return sock;
        }
    }
}

以上代码是Httpclient提供的实例,稍作修改;
先启动shadowsocks-netty
然后运行ClientExecuteSOCKS

1.结果报如下错误:

I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://www.google.com:80: 
The target server failed to respond

可以观察shadowsocks-netty的服务器端shadowsocks-netty-server,有如下日志:

org.netty.proxy.ClientProxyHandler$2 - connect fail host = 67.15.129.210,port = 80,inetAddress = /67.15.129.210

域名解析后的ip地址连接失败,多次试验ip地址是会变动的,导致有时候能成功,有时候失败;

针对此问题可以直接使用域名访问,代码做如下修改:

sock.connect(remoteAddress, connectTimeout);

将如上代码改成:

sock.connect(InetSocketAddress.createUnresolved(remoteAddress.getHostName(), remoteAddress.getPort()),connectTimeout);

2.重新运行,报如下错误:

Caused by: org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:151)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:161)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:153)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    ... 2 more

通过debug进入DefaultHttpResponseParser的parseHead方法中发现,每次读取http协议的状态行是” HTTP/1.1 200 OK”,有两个空格cunz,
导致比对失败,经分析发现是在Socks四次握手的时候没有将握手数据读取干净,导致后面的真实数据出现脏数据;
分别看netty提供的SocksCmdResponse和jdk中的SocksSocketImpl类:

public void encodeAsByteBuf(ByteBuf byteBuf) {
        byteBuf.writeByte(protocolVersion().byteValue());
        byteBuf.writeByte(cmdStatus.byteValue());
        byteBuf.writeByte(0x00);
        byteBuf.writeByte(addressType.byteValue());
        switch (addressType) {
            case IPv4: {
                byte[] hostContent = host == null ?
                        IPv4_HOSTNAME_ZEROED : NetUtil.createByteArrayFromIpAddressString(host);
                byteBuf.writeBytes(hostContent);
                byteBuf.writeShort(port);
                break;
            }
            case DOMAIN: {
                byte[] hostContent = host == null ?
                        DOMAIN_ZEROED : host.getBytes(CharsetUtil.US_ASCII);
                byteBuf.writeByte(hostContent.length);   // domain length
                byteBuf.writeBytes(hostContent);   // domain value
                byteBuf.writeShort(port);  // port value
                break;
            }
            case IPv6: {
                byte[] hostContent = host == null
                        ? IPv6_HOSTNAME_ZEROED : NetUtil.createByteArrayFromIpAddressString(host);
                byteBuf.writeBytes(hostContent);
                byteBuf.writeShort(port);
                break;
            }
        }
    }

SocksSocketImpl类connect方法部分代码如下:

switch (data[1]) {
        case REQUEST_OK:
            // success!
            switch(data[3]) {
            case IPV4:
                addr = new byte[4];
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != 4)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case DOMAIN_NAME:
                len = data[1];
                byte[] host = new byte[len];
                i = readSocksReply(in, host, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
        ......
}

shadowsocks-netty返回的addressType为DOMAIN类型,会发现写入的数据格式和读取的格式不一致,导致产生脏数据;

此问题可以修改shadowsocks-netty返回的addressType为IPV4类型,具体代码在SocksServerConnectHandler中:

private SocksCmdResponse getSuccessResponse(SocksCmdRequest request) {
    return new SocksCmdResponse(SocksCmdStatus.SUCCESS, SocksAddressType.IPv4);
}

修改之后运行正确结果如下:

HTTP/1.1 200 OK
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's ...具体网页内容省略...</body></html>

下载视频
下载视频的部分代码如下:

public class ClientExecuteSOCKS2 {
 
    /** 代理参数 IP+PORT **/
    private static String PROXY_IP = "localhost";
    private static int PROXY_PORT = 1080;
 
    public static void main(String[] args) throws Exception {
        Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
                .register("http", new MyConnectionSocketFactory()).build();
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(reg);
        CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
        try {
            InetSocketAddress socksaddr = new InetSocketAddress(PROXY_IP, PROXY_PORT);
            HttpClientContext context = HttpClientContext.create();
            context.setAttribute("socks.address", socksaddr);
            HttpGet request = new HttpGet("http://xxxxxxx.mp4");
            CloseableHttpResponse response = httpclient.execute(request, context);
 
            InputStream is = null;
            OutputStream os = null;
            try {
                System.out.println(response.getStatusLine());
                is = response.getEntity().getContent();
                System.out.println(response.getEntity().getContentLength());
                os = new FileOutputStream(new File("D:\\tmp.mp4"));
                byte tmp[] = new byte[1024];
                int l;
                while ((l = is.read(tmp)) != -1) {
                    os.write(tmp, 0, l);
                }
                os.flush();
            } finally {
                if (response != null) {
                    response.close();
                }
                if (is != null) {
                    is.close();
                }
                if (os != null) {
                    os.close();
                }
            }
        } finally {
            httpclient.close();
        }
    }
}

总结
下载国外视频有很多种方式,比如浏览器插件,本文依赖客户端Socks5代理程序,使用Httpclient4进行资源下载,更容易自动化和可控性;本文主要用于学习使用。

个人博客:codingo.xyz

© 著作权归作者所有

共有 人打赏支持
ksfzhaohui

ksfzhaohui

粉丝 329
博文 132
码字总数 170348
作品 3
南京
高级程序员
python模块介绍-shadowsocks:穿越防火墙的快速隧道代理(实现自由冲浪)

Shadowsocks 是一个安全的socks5代理,用于保护网络流量,是一个开源项目。 由于Shadowsocks使用socks5协议和可自定义密码的工业级算法加密,使得流量在网络传输过程中不易被他人读取。但是使...

磁针石
2015/06/09
0
0
CentOS6.5安装部署Shadowsocks服务器

一、环境介绍:   1、服务器:     CentOS6.5x8664   2、Windows客户端     Windows 10 二、安装部署:   1、Shadowsocks是什么?     Shadowsocks是一个安全的Socks代理,...

yangxuncai110
06/27
0
0
CentOS 搭建ss proxy

一、shadowsocks简介(以下来自wiki百科) shadowsocks是一种基于Socks5代理方式的网络数据加密传输包,并采用Apache许可证、GPL、MIT许可证等多种自由软件许可协议开放源代码。shadowsocks...

tivenwang
2017/11/07
0
0
用SS+Privoxy+树莓派让Node爬虫科学上网

前言 之前一直想要做一个feed读取器,并利用feed获取的链接去抓取资讯全文。然而有一些feed需要科学上网才能获取,这就让我萌生了:如何让Node科学上网来进行爬虫抓取。 查阅了资料发现,Nod...

尺子先生
2017/05/01
0
0
CentOS/Linux 下安装 shadowsocks-qt5客户端实现科学上网

方法一(DNF指令): 1、如果未安装DNF,请跳转至:http://dev.fjuts.com:83/blog/index.php/linux/261.html2、添加shadowsocks 的 copr源(copr是dnf下的一个插件,但是我的CentOS7怎么提示找...

刘草
2015/06/22
0
3

没有更多内容

加载失败,请刷新页面

加载更多

js的

<%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %> <c:forEach items="${topics}" var="item" varStatus="status"> </c:forEach> 注意 c:forEach E大写 varStatus ......

踏破铁鞋无觅处
23分钟前
0
0
带你走进java集合之ConcurrentHashMap

一、概述 上一篇文章《带你走进java集合之HashMap》分析了HashMap的实现原理,重点分析了HashMap是怎么样的一种数据结构,以及如何去插入,查询,扩容等操作。相信经过上一篇文章的学习,大家...

木木匠
24分钟前
0
0
spring-boot 热加载实现替换

参考资料 1、spring-boot 热加载实现替换

哎小艾
26分钟前
0
0
kotlin使用spring mvc(二)

使用FilterRegistrationBean注册Filter 使用WebFilter配置过滤器的缺点是不可以对过滤器进行排序,但是使用FilterRegistrationBean可以设置Filter执行的顺序 编写过滤器 class CustomFilter...

weidedong
27分钟前
0
0
Qt那些事0.0.5

碰到了中文乱码问题。 虽然是自己做了件令自己都不齿的事情,但是情急之下,暂且如此:将中文硬编码进代码中。 我也想通过tr+qm翻译进行转换,但是难过的是,tr之后,找不到或者不起作用。这...

Ev4n
29分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部