加载中
liwan123 2016/04/04 23:02 回答了问题: python爬虫抓取数据插入mysql数据库问题

高手们,我改了下,但是运行后一共插入了10行数据,其中一行数据提示了个运行错误,,其他行数据插入没提示错误,我把ttime在mysql里面类型改成了timestamp,不知为何会提示这个错误。

D:/python-work/demo.py:62: Warning: Data truncated for column 'ttime' at row 1
  cursor.execute('insert into user(title,content,tags,ttime,yezhu,leixing) values (%s,%s,%s,%s,%s,%s)', (title,content,','.join(tags),ttime,','.join(yezhu),','.join(leixing)))

代码如下:

# encoding: utf-8
import requests
from lxml import etree
import MySQLdb
import sys
import re
import HTMLParser
default_encoding = 'utf-8'
if sys.getdefaultencoding() != default_encoding:
    reload(sys)
    sys.setdefaultencoding(default_encoding)
download_url = "http://coalchem.anychem.com/category/industry/page/4"
html = requests.get(download_url).content
selector = etree.HTML(html)
urllist = selector.xpath('//h4/a[@href]')
def filterHtmlTag(origHtml):
    filteredHtml = origHtml;
    #Method 1: auto remove tag use re
    #remove br
    filteredHtml = re.sub("<br\s*>", "", filteredHtml, flags=re.I);
    filteredHtml = re.sub("<br\s*/>", "", filteredHtml, flags=re.I);
    #logging.info("remove br, filteredHtml=%s", filteredHtml);
    #remove a
    filteredHtml = re.sub("<a\s+[^<>]+>(?P<aContent>[^<>]+?)</a>", "\g<aContent>", filteredHtml, flags=re.I);
    #logging.info("remove a, filteredHtml=%s", filteredHtml);
    #remove b,strong
    filteredHtml = re.sub("<b>(?P<bContent>[^<>]+?)</b>", "\g<bContent>", filteredHtml, re.I);
    filteredHtml = re.sub("<strong>(?P<strongContent>[^<>]+?)</strong>", "\g<strongContent>", filteredHtml, flags=re.I);
    #logging.info("remove b,strong, filteredHtml=%s", filteredHtml);
    return filteredHtml;
for url in urllist:
    url = url.get('href')
    linkhtml = requests.get(url).content
    linkselector = etree.HTML(linkhtml.lower().decode('utf-8'))
    title = linkselector.xpath('//*[@id]/header/h1/text()')[0]
    ttime = linkselector.xpath('//*[@id]/header/div/time[@datetime]')[0].get('datetime').split('T')[0]
    content = linkselector.xpath('//div[@class="td-post-text-content"]')[0]
    content = etree.tostring(content,encoding='utf-8') # utf-8
    #content = filterHtmlTag(content)
    imglist = linkselector.xpath('//img')
    for img in imglist:
        print img.get('src')

    p=re.compile(r'''(<img\b[^<>]*?\bsrc[\s\t\r\n]*=[\s\t\r\n]*["']?[\s\t\r\n]*([^\s\t\r\n"'<>]*)[^<>]*?/?[\s\t\r\n]*>)''',re.IGNORECASE)
    content = p.sub(r'''<span class="openIcon"><em></em><a href="\2">\1</a></span>''',content)
    urln = url.split('.')[-2].split('-')[-1]
    yezhu = linkselector.xpath('''//*[@id="post-'''+urln+'''"]/ul[1]/li[1]/a/text()''')
    tags = linkselector.xpath('''//*[@id="post-'''+urln+'''"]/footer/div/div/ul/li/a/text()''')
    leixing = linkselector.xpath('''//*[@id="post-'''+urln+'''"]/header/div/ul/li/a/text()''')
    #print ','.join(tags)   #用逗号链接列表
    db = MySQLdb.connect("localhost","root","root","test",charset='utf8')
    for yezhut in yezhu:
        print yezhut
    for tag in tags:
        print tag
    print url,urln,title,ttime,content,leixing
    cursor = db.cursor()
    cursor.execute('insert into user(title,content,tags,ttime,yezhu,leixing) values (%s,%s,%s,%s,%s,%s)', (title,content,','.join(tags),ttime,','.join(yezhu),','.join(leixing)))
    db.commit()
db.close()


@liwan123
# encoding: utf-8import requestsfrom lxml import etreeimp...
liwan123 2016/03/31 15:52 回答了问题: python爬虫抓取数据插入mysql数据库问题

我把插入数据库的代码全部取消注释,运行结果是只能插入两行数据后就错误终止了,

Traceback (most recent call last):
  File "D:/python-work/demo.py", line 33, in <module>
    cursor.execute('insert into user(title,content,tags,ttime) values (%s,%s,%s,%s)', (title,content,tags,ttime))
  File "build\bdist.win32\egg\MySQLdb\cursors.py", line 205, in execute
  File "build\bdist.win32\egg\MySQLdb\connections.py", line 36, in defaulterrorhandler
_mysql_exceptions.OperationalError: (1241, 'Operand should contain 1 column(s)')
这是错误代码,奇怪了

@liwan123
# encoding: utf-8import requestsfrom lxml import etreeimp...
liwan123 2016/03/31 15:42 回答了问题: python爬虫抓取数据插入mysql数据库问题

谢谢大神。但是我把这句代码取消注释,打印content内容是乱码。要把 content = etree.tostring(content,encoding='utf-8') 中utf-8改成gb2312才能显示中文,这个奇怪了

print url,urln,title,ttime,content
@liwan123
# encoding: utf-8import requestsfrom lxml import etreeimp...
liwan123 2016/03/31 13:19 回答了问题: python爬虫抓取数据插入mysql数据库问题
插入mysql数据库只有两行数据,其中data没有变化,是默认值0000-00-00,content内容只有中文之前的英文代码
@liwan123
# encoding: utf-8import requestsfrom lxml import etreeimp...
不知道问题在哪里
@liwan123
现在我想实现点击一个按钮实现显示和隐藏,不知道怎么做?就是同一个按钮第一次点击显示div的话,第二次点击就是隐藏d...

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<script type="text/javascript">
$(function(){
  $("#DH").click(function(){
      if (showOrHide) {
  $('#testD').show();
} else {
  $('#testD').hide();
}        });


})

</script>

高手,我怎么实现不了呢?奇怪了

@liwan123
现在我想实现点击一个按钮实现显示和隐藏,不知道怎么做?就是同一个按钮第一次点击显示div的话,第二次点击就是隐藏d...
能具体说下代码吗?对jquery不懂啊
@liwan123
现在我想实现点击一个按钮实现显示和隐藏,不知道怎么做?就是同一个按钮第一次点击显示div的话,第二次点击就是隐藏d...
我也是自己使用
@liwan123
php小白一个,想实现用phpmailer给几百个会员的邮箱发送邮件任务,不知道从何做起,大家有没有例子可以提供啊...

能提供下实例代码吗?现在我头晕了,我的QQ31558614

@liwan123
php小白一个,想实现用phpmailer给几百个会员的邮箱发送邮件任务,不知道从何做起,大家有没有例子可以提供啊...
liwan123 2013/02/26 21:55 发布了问题:

没有更多内容

加载失败,请刷新页面

返回顶部
顶部