HttpClient4基于Shadowsocks-netty的Socks代理
HttpClient4基于Shadowsocks-netty的Socks代理
ksfzhaohui 发表于3个月前
HttpClient4基于Shadowsocks-netty的Socks代理
  • 发表于 3个月前
  • 阅读 233
  • 收藏 8
  • 点赞 0
  • 评论 0

标题:腾讯云 新注册用户域名抢购1元起>>>   

前言
最近想批量下载一些国外网站的视频,之前写过一个代理程序shadowsocks-netty,打算直接
用它来当作客户端代理程序,而HttpClient4也支持Socks代理;所有准备用HttpClient4来访问国外网站和视频资源

HttpClient4版本

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.3.6</version>
</dependency>

访问网站
设置代理ip和port分别是:localhost和1080
访问国外网站hostname为:www.google.com
具体代码如下:

public class ClientExecuteSOCKS {
 
    /** 代理参数 IP+PORT **/
    private static String PROXY_IP = "localhost";
    private static int PROXY_PORT = 1080;
 
    public static void main(String[] args) throws Exception {
        Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
                .register("http", new MyConnectionSocketFactory()).build();
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(reg);
        CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
        try {
            InetSocketAddress socksaddr = new InetSocketAddress(PROXY_IP, PROXY_PORT);
            HttpClientContext context = HttpClientContext.create();
            context.setAttribute("socks.address", socksaddr);
 
            HttpHost target = new HttpHost("www.google.com", 80, "http");
            HttpGet request = new HttpGet("/");
 
            System.out.println("Executing request " + request + " to " + target + " via SOCKS proxy " + socksaddr);
            CloseableHttpResponse response = httpclient.execute(target, request, context);
            try {
                System.out.println("----------------------------------------");
                System.out.println(response.getStatusLine());
                String htmlStr = EntityUtils.toString(response.getEntity());
                System.out.println(htmlStr);
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }
    }
 
    static class MyConnectionSocketFactory implements ConnectionSocketFactory {
 
        public Socket createSocket(final HttpContext context) throws IOException {
            InetSocketAddress socksaddr = (InetSocketAddress) context.getAttribute("socks.address");
            Proxy proxy = new Proxy(Proxy.Type.SOCKS, socksaddr);
            return new Socket(proxy);
        }
 
        public Socket connectSocket(final int connectTimeout, final Socket socket, final HttpHost host,
                final InetSocketAddress remoteAddress, final InetSocketAddress localAddress, final HttpContext context)
                throws IOException, ConnectTimeoutException {
            Socket sock;
            if (socket != null) {
                sock = socket;
            } else {
                sock = createSocket(context);
            }
            if (localAddress != null) {
                sock.bind(localAddress);
            }
            try {
                sock.connect(remoteAddress, connectTimeout);
            } catch (SocketTimeoutException ex) {
                throw new ConnectTimeoutException(ex, host, remoteAddress.getAddress());
            }
            return sock;
        }
    }
}

以上代码是Httpclient提供的实例,稍作修改;
先启动shadowsocks-netty
然后运行ClientExecuteSOCKS

1.结果报如下错误:

I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://www.google.com:80: 
The target server failed to respond

可以观察shadowsocks-netty的服务器端shadowsocks-netty-server,有如下日志:

org.netty.proxy.ClientProxyHandler$2 - connect fail host = 67.15.129.210,port = 80,inetAddress = /67.15.129.210

域名解析后的ip地址连接失败,多次试验ip地址是会变动的,导致有时候能成功,有时候失败;

针对此问题可以直接使用域名访问,代码做如下修改:

sock.connect(remoteAddress, connectTimeout);

将如上代码改成:

sock.connect(InetSocketAddress.createUnresolved(remoteAddress.getHostName(), remoteAddress.getPort()),connectTimeout);

2.重新运行,报如下错误:

Caused by: org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:151)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:161)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:153)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    ... 2 more

通过debug进入DefaultHttpResponseParser的parseHead方法中发现,每次读取http协议的状态行是” HTTP/1.1 200 OK”,有两个空格cunz,
导致比对失败,经分析发现是在Socks四次握手的时候没有将握手数据读取干净,导致后面的真实数据出现脏数据;
分别看netty提供的SocksCmdResponse和jdk中的SocksSocketImpl类:

public void encodeAsByteBuf(ByteBuf byteBuf) {
        byteBuf.writeByte(protocolVersion().byteValue());
        byteBuf.writeByte(cmdStatus.byteValue());
        byteBuf.writeByte(0x00);
        byteBuf.writeByte(addressType.byteValue());
        switch (addressType) {
            case IPv4: {
                byte[] hostContent = host == null ?
                        IPv4_HOSTNAME_ZEROED : NetUtil.createByteArrayFromIpAddressString(host);
                byteBuf.writeBytes(hostContent);
                byteBuf.writeShort(port);
                break;
            }
            case DOMAIN: {
                byte[] hostContent = host == null ?
                        DOMAIN_ZEROED : host.getBytes(CharsetUtil.US_ASCII);
                byteBuf.writeByte(hostContent.length);   // domain length
                byteBuf.writeBytes(hostContent);   // domain value
                byteBuf.writeShort(port);  // port value
                break;
            }
            case IPv6: {
                byte[] hostContent = host == null
                        ? IPv6_HOSTNAME_ZEROED : NetUtil.createByteArrayFromIpAddressString(host);
                byteBuf.writeBytes(hostContent);
                byteBuf.writeShort(port);
                break;
            }
        }
    }

SocksSocketImpl类connect方法部分代码如下:

switch (data[1]) {
        case REQUEST_OK:
            // success!
            switch(data[3]) {
            case IPV4:
                addr = new byte[4];
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != 4)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case DOMAIN_NAME:
                len = data[1];
                byte[] host = new byte[len];
                i = readSocksReply(in, host, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
        ......
}

shadowsocks-netty返回的addressType为DOMAIN类型,会发现写入的数据格式和读取的格式不一致,导致产生脏数据;

此问题可以修改shadowsocks-netty返回的addressType为IPV4类型,具体代码在SocksServerConnectHandler中:

private SocksCmdResponse getSuccessResponse(SocksCmdRequest request) {
    return new SocksCmdResponse(SocksCmdStatus.SUCCESS, SocksAddressType.IPv4);
}

修改之后运行正确结果如下:

HTTP/1.1 200 OK
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's ...具体网页内容省略...</body></html>

下载视频
下载视频的部分代码如下:

public class ClientExecuteSOCKS2 {
 
    /** 代理参数 IP+PORT **/
    private static String PROXY_IP = "localhost";
    private static int PROXY_PORT = 1080;
 
    public static void main(String[] args) throws Exception {
        Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
                .register("http", new MyConnectionSocketFactory()).build();
        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(reg);
        CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
        try {
            InetSocketAddress socksaddr = new InetSocketAddress(PROXY_IP, PROXY_PORT);
            HttpClientContext context = HttpClientContext.create();
            context.setAttribute("socks.address", socksaddr);
            HttpGet request = new HttpGet("http://xxxxxxx.mp4");
            CloseableHttpResponse response = httpclient.execute(request, context);
 
            InputStream is = null;
            OutputStream os = null;
            try {
                System.out.println(response.getStatusLine());
                is = response.getEntity().getContent();
                System.out.println(response.getEntity().getContentLength());
                os = new FileOutputStream(new File("D:\\tmp.mp4"));
                byte tmp[] = new byte[1024];
                int l;
                while ((l = is.read(tmp)) != -1) {
                    os.write(tmp, 0, l);
                }
                os.flush();
            } finally {
                if (response != null) {
                    response.close();
                }
                if (is != null) {
                    is.close();
                }
                if (os != null) {
                    os.close();
                }
            }
        } finally {
            httpclient.close();
        }
    }
}

总结
下载国外视频有很多种方式,比如浏览器插件,本文依赖客户端Socks5代理程序,使用Httpclient4进行资源下载,更容易自动化和可控性;本文主要用于学习使用。

个人博客:codingo.xyz

标签: HttpClient
共有 人打赏支持
ksfzhaohui
粉丝 240
博文 115
码字总数 128920
作品 3
×
ksfzhaohui
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: