python 提取网页内容工具

原创
2011/03/07 11:28
阅读数 1K

#!/usr/bin/python
#coding=utf8

__doc__="""
	This class is used to extract text from a string content,
	mostly used when we need to extract what we want from a 
	downloaded html page
"""

__author__="""jemygraw@gmail.com"""


class TextUtil:
	def __init__(self,content):
		self.content=content
		self.start_index=0
	def selectText(self,start,end):
		self.start_flag=start
		self.end_flag=end
		from_index=self.content.find(start,self.start_index)
		if from_index!=-1:
			end_index=self.content.find(end,from_index+len(start))
			if end_index!=-1:
				self.start_index=end_index+len(end)
				self.from_index=from_index
				self.end_index=end_index
				return True
		return False
		
	def extractText(self):
		return self.content[self.from_index+len(self.start_flag):self.end_index]
	
	def deselectText(self):
		self.from_index=0
		self.end_index=0
		self.start_index=0
展开阅读全文
加载中
点击加入讨论🔥(1) 发布并加入讨论🔥
打赏
1 评论
3 收藏
0
分享
返回顶部
顶部