python利用poplib来收取邮件

原创
2016/08/17 20:12
阅读数 4K

收取邮件有两种方式,一种是POP3, 另一种是IMAP,它们都是收取邮件服务器支持的协议,我们用foxmail进行邮件的收发,感觉不到收发的流程,而实际上收和发是作用在不同的服务器上,发邮件有专门的发邮件服务器,收邮件也有专门的收邮件服务器,发邮件只负责发送不管收取,同时收取邮件也不管如何发邮件,因此在测试时收和发邮件是分开进行的,虽然大多数时候收发邮件服务是装在一个服务器上,但测试测的是协议,如SMTP, 如POP3, IMAP,python中的poplib收取邮件还是非常简单的,重点是收来的邮件需要解析,因为SMTP是进行编码过的,收来的邮件需要进行处理后才能被我们阅读,因此又要用到email模块,SMTP用email来传递内容,POP3用email来解析内容

poplib

#返回所有邮件的编号
list(self,which=None): 
	['response',['message_count, octets'],octets]/[scan listing for the message]
-----------------------------
('+OK 7 messages:', ['1 1080', '2 1080', '3 1079', '4 675265', '5 675506', '6 675534', '7 597'], 61)

#收取整封邮件,索引号必需从1开始
retr(self,which): 
	return whole message of number which
	
#身份认证
user(self,user)
pass_(self.pwd)

#显示调试信息
set_debuglevel(self,level)

#返回邮件数量和邮件大小
stat(self)
	get mailbox size
	return(mail_counter, mailbox_size)
-------------------------------------------
(7, 2030141)

#显示邮件的头信息,以及定制正文数据	
top(self,which,howmuch)
	return head of message of which, and how much lines of body message 
	

原邮件如下:

26169 From hding@hding.com  Tue Aug 16 20:06:02 2016
26170 Return-Path: <hding@hding.com>
26171 Received: from hding.com ([192.168.10.3])
26172     by ding.com (8.13.8/8.13.8) with ESMTP id u7GC623I002429
26173     (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
26174     for <qa@ding.com>; Tue, 16 Aug 2016 20:06:02 +0800
26175 Received: from 10.8.116.6 ([10.8.116.6])
26176     (authenticated bits=0)
26177     by hding.com (8.13.8/8.13.8) with ESMTP id u7GC0v9x027721
26178     for qa@ding.com; Tue, 16 Aug 2016 20:05:13 +0800
26179 Date: Tue, 16 Aug 2016 20:00:57 +0800
26180 From: hding@hding.com
26181 Message-Id: <201608161205.u7GC0v9x027721@hding.com>
26182 X-UID: 71                                                  
26183 Status: O
26184 
26185 "hello world"
26186 I am terry
26187 please welcome me

top(7,1)函数返回的第7封邮件的头信息,1行正文,是一个元组

('+OK', ['Return-Path: <hding@hding.com>', 'Received: from hding.com ([192.168.10.3])', '\tby ding.com (8.13.8/8.13.8) with ESMTP id u7GC623I002429', '\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)', '\tfor <qa@ding.com>; Tue, 16 Aug 2016 20:06:02 +0800', 'Received: from 10.8.116.6 ([10.8.116.6])', '\t(authenticated bits=0)', '\tby hding.com (8.13.8/8.13.8) with ESMTP id u7GC0v9x027721', '\tfor qa@ding.com; Tue, 16 Aug 2016 20:05:13 +0800', 'Date: Tue, 16 Aug 2016 20:00:57 +0800', 'From: hding@hding.com', 'Message-Id: <201608161205.u7GC0v9x027721@hding.com>', '', '"hello world"'], 566)

retr(7) 函数返回整封邮件,是一元组,内容在retr(7)[1]

('+OK 597 octets', ['Return-Path: <hding@hding.com>', 'Received: from hding.com ([192.168.10.3])', '\tby ding.com (8.13.8/8.13.8) with ESMTP id u7GC623I002429', '\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)', '\tfor <qa@ding.com>; Tue, 16 Aug 2016 20:06:02 +0800', 'Received: from 10.8.116.6 ([10.8.116.6])', '\t(authenticated bits=0)', '\tby hding.com (8.13.8/8.13.8) with ESMTP id u7GC0v9x027721', '\tfor qa@ding.com; Tue, 16 Aug 2016 20:05:13 +0800', 'Date: Tue, 16 Aug 2016 20:00:57 +0800', 'From: hding@hding.com', 'Message-Id: <201608161205.u7GC0v9x027721@hding.com>', '', '"hello world"', 'I am terry', 'please welcome me'], 597)

只取正文信息只需要把retr得到的全部信息减掉头信息即可

head = pop.top(7,0)
message = pop.retr(7)
body = [line for line in message[1] if line not in head[1]]

如果邮件有附件如何处理

retr()收到的邮件是一个多字段的列表,还谈不上是邮件,需要通过mail.parser去解析,解析出来的邮件将符合邮件的格式,如Mail From, Mail To, 等

获取第7封开始的邮件
messages = [pop_conn.retr(i) for i in range(7, pop_conn.stat()[0]+1)] 
--------------------------------------------------------------------------
[('+OK 597 octets', ['Return-Path: <hding@hding.com>', 'Received: from hding.com ([192.168.10.3])', '\tby ding.com (8.13.8/8.13.8) with ESMTP id u7GC623I002429', '\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)', '\tfor <qa@ding.com>; Tue, 16 Aug 2016 20:06:02 +0800', 'Received: from 10.8.116.6 ([10.8.116.6])', '\t(authenticated bits=0)', '\tby hding.com (8.13.8/8.13.8) with ESMTP id u7GC0v9x027721', '\tfor qa@ding.com; Tue, 16 Aug 2016 20:05:13 +0800', 'Date: Tue, 16 Aug 2016 20:00:57 +0800', 'From: hding@hding.com', 'Message-Id: <201608161205.u7GC0v9x027721@hding.com>', '', '"hello world"', 'I am terry', 'please welcome me'], 597)]
给每封邮件中的内容以'\n'字符作为连接符形成字符串
messages = ["\n".join(msg[1]) for msg in messages]
---------------------------------------------------------------------------
['Return-Path: <hding@hding.com>\nReceived: from hding.com ([192.168.10.3])\n\tby ding.com (8.13.8/8.13.8) with ESMTP id u7GC623I002429\n\t(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)\n\tfor <qa@ding.com>; Tue, 16 Aug 2016 20:06:02 +0800\nReceived: from 10.8.116.6 ([10.8.116.6])\n\t(authenticated bits=0)\n\tby hding.com (8.13.8/8.13.8) with ESMTP id u7GC0v9x027721\n\tfor qa@ding.com; Tue, 16 Aug 2016 20:05:13 +0800\nDate: Tue, 16 Aug 2016 20:00:57 +0800\nFrom: hding@hding.com\nMessage-Id: <201608161205.u7GC0v9x027721@hding.com>\n\n"hello world"\nI am terry\nplease welcome me']
解析文件内容
messages = [parser.Parser().parsestr(msg) for msg in messages] 
------------------------------------------------------------------------------
[<email.message.Message instance at 0x02C1C9B8>]           返回了一个message的实例
获取单封邮件message内容
for message in messages:
    print message
--------------------------------------------------------------------------------
From nobody Wed Aug 17 19:04:31 2016
Return-Path: <hding@hding.com>
Received: from hding.com ([192.168.10.3])
 by ding.com (8.13.8/8.13.8) with ESMTP id u7GC623I002429
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
 for <qa@ding.com>; Tue, 16 Aug 2016 20:06:02 +0800
Received: from 10.8.116.6 ([10.8.116.6]) (authenticated bits=0)
 by hding.com (8.13.8/8.13.8) with ESMTP id u7GC0v9x027721
 for qa@ding.com; Tue, 16 Aug 2016 20:05:13 +0800
Date: Tue, 16 Aug 2016 20:00:57 +0800
From: hding@hding.com
Message-Id: <201608161205.u7GC0v9x027721@hding.com>

"hello world"
I am terry
please welcome me
----------------------------------------------------------------------------------
从上文可以看出解析的非常好,如何区别附件和正文
    for part in message.walk():                                                      #遍历邮件内容 
        fileName = part.get_filename()                                               #得到附件名 
        contentType = part.get_content_type()                                        #得到附件类型  
        # 保存附件  
        if fileName:             #如果有附件则一定会有文件名                                                                              #附件重新写到新的文件中 
            data = part.get_payload(decode=True)  
            f_attach = open(fileName, 'wb')  
            f_attach.write(data)  
            f_attach.close()  
        elif contentType == 'text/plain' or contentType == 'text/html':               #正文照抄到正文中 
            #保存正文  
            data = part.get_payload(decode=True)  
            print data
----------------------------------------------------------------------------------
"hello world"
I am terry
please welcome me


带附件的邮件

26229 From hding@hding.com  Wed Aug 17 19:21:08 2016                           fist part
26230 Return-Path: <hding@hding.com>
26231 Received: from hding.com ([192.168.10.3])
26232     by ding.com (8.13.8/8.13.8) with ESMTP id u7HBL8rW015601
26233     (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
26234     for <qa@ding.com>; Wed, 17 Aug 2016 19:21:08 +0800                                                         
26235 Received: from ding.com ([10.10.10.3])
26236     (authenticated bits=0)
26237     by hding.com (8.13.8/8.13.8) with ESMTP id u7HBL7sl008349
26238     (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
26239     for <qa@ding.com>; Wed, 17 Aug 2016 19:21:07 +0800
26240 Date: Wed, 17 Aug 2016 19:21:07 +0800
26241 Message-Id: <201608171121.u7HBL7sl008349@hding.com>
26242 Content-Type: multipart/mixed; boundary="===============4712348666551551578=="
26243 MIME-Version: 1.0
26244 From: hding@hding.com
26245 To: qa@ding.com
26246 Subject: test_mail
26247 X-UID: 73                                                 
26248 Status: RO
26249 
26250 --===============4712348666551551578==                                   second part
26251 Content-Type: doc/test_file
26252 MIME-Version: 1.0
26253 Content-Disposition: attachment; filename="test_file"
26254 Content-Transfer-Encoding: base64
26255 
26256 aGVsbG8gd29ybGQKCkkgYW0gYSB0ZXN0IGZpbGUsIGNhbiB5b3UgcmVhZCBpdCAKCmFuZCBnaXZl
26257 IG1lIGEgcmVzcG9uc2UK
26258 
26259 --===============4712348666551551578==                                  third part
26260 Content-Type: text/plain; charset="utf-8"
26261 MIME-Version: 1.0
26262 Content-Transfer-Encoding: base64
26263 
26264 aGVsbG8gd29ybGQKCkkgYW0gYSB0ZXN0IGZpbGUsIGNhbiB5b3UgcmVhZCBpdCAKCmFuZCBnaXZl
26265 IG1lIGEgcmVzcG9uc2UKPGltYWdlIHNyYz0nY2lkOjEnPg==
26266 
26267 --===============4712348666551551578==-- 

在工作中,由于我只需要在pop3上进行收文件即可,无需真实下载下来,因此只需要retr函数就完成任务

  1 #!/usr/bin/env python
  2 #coding:utf-8
  3 
  4 from poplib import POP3_SSL
  5 
  6 class DPI_SSL_POP3(object):
  7 
  8     def __init__(self,username='username',password='password',host='192.168.10.3'):
  9         self.pop = POP3_SSL(host)
 10         self.pop.user(username)
 11         self.pop.pass_(password)
 12 
 13     def get_message_from_pop3s(self):
 14         self.pop.retr(self.pop.stat()[0])                       #最新一封邮件
 15 
 16 
 17 if __name__ == '__main__':
 18 
 19     pops = DPI_SSL_POP3()
 20     pops.get_message_from_pop3s()

pop3 server在WAN侧, 我在LAN侧执行脚本,收取带有病毒的附件,病毒经过Firewall,进行检测

展开阅读全文
打赏
0
2 收藏
分享
加载中
更多评论
打赏
0 评论
2 收藏
0
分享
返回顶部
顶部