您好,匿名用户

使用selenium和phantomjs爬虫遇到的缓存问题 ?

0 投票

使用selenium和phantomjs爬虫遇到问题,代码如下【【采集时我用了蓝灯软件来代理,不能直接采集】】:

代码如下:

from selenium import webdriver
import time 
driver = webdriver.PhantomJS()
driver.get('http://chuansong.me')
alla = driver.find_elements_by_class_name('question_link')
for a in alla:
    a = a.get_attribute('href')
    print(a)
    driver.get(a)
    title = driver.find_element_by_id('activity-name').text
    writer = driver.find_element_by_id('post-user').text
    content = driver.find_element_by_id('js_content').text
    print(writer,title,content)
    #time.sleep(8)
driver.close()
driver.quit()

能采集到一个网址链接的内容,然后提示错误:

Traceback (most recent call last):
  File "D:/python-work/test.py", line 10, in <module>
    a = a.get_attribute('href')
  File "D:\Program Files\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 141, in get_attribute
    resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
  File "D:\Program Files\Python35-32\lib\site-packages\selenium\webdriver\remote\webelement.py", line 494, in _execute
    return self._parent.execute(command, params)
  File "D:\Program Files\Python35-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
    self.error_handler.check_response(response)
  File "D:\Program Files\Python35-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: {"errorMessage":"Element does not exist in cache","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:60284","User-Agent":"Python-urllib/3.5"},"httpVersion":"1.1","method":"GET","url":"/attribute/href","urlParsed":{"anchor":"","query":"","file":"href","directory":"/attribute/","path":"/attribute/href","relative":"/attribute/href","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/attribute/href","queryKey":{},"chunks":["attribute","href"]},"urlOriginal":"/session/bcbced70-c66a-11e6-a824-4b87531d9c78/element/:wdc:1482207278197/attribute/href"}}
Screenshot: available via screen
用户头像 提问 2017年 6月19日 @ Lancer 上等兵 (538 威望)
分享到:

1个回答

0 投票

大神们,我修改了代码,但是执行速度非常慢,也禁止了图片的加载,有时候又出现同样的问题,请大神给看看,有哪些还可以修改和优化的,代码如下:

__author__ = 'Administrator'

from selenium import webdriver
import time

cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 1000
cap["phantomjs.page.settings.loadImages"] = False
#cap["phantomjs.page.settings.javascriptEnabled"] = False
cap["phantomjs.page.settings.localToRemoteUrlAccessEnabled"] = False
driver = webdriver.PhantomJS(desired_capabilities=cap)

#driver = webdriver.PhantomJS()
driver.get('http://chuansong.me')
length = len(driver.find_elements_by_class_name('question_link'))
for i in range(0,length):
    alla = driver.find_elements_by_class_name('question_link')
    a = alla[i]
    print(a)
    if 'question_link' in a.get_attribute('class') or 'n' in a.get_attribute('href'):
        a.click()
        driver.get(a.get_attribute('href'))
        title = driver.find_element_by_id('activity-name').text
        writer = driver.find_element_by_id('post-user').text
        content = driver.find_element_by_id('js_content').get_attribute('outerHTML')
        print(writer,title,content)
        driver.back()
        time.sleep(8)
driver.close()
driver.quit()
用户头像 回复 2017年 6月19日 @ Nautilus 下士 (908 威望)
提一个问题:

相关问题

0 投票
1 回复 9 阅读
用户头像 提问 2017年 5月1日 @ Caitlyn 上士 (1,532 威望)
0 投票
1 回复 364 阅读
0 投票
1 回复 10 阅读
用户头像 提问 2017年 5月1日 @ Ryze 中士 (1,028 威望)

欢迎来到随意问技术百科, 这是一个面向专业开发者的IT问答网站,提供途径助开发者查找IT技术方案,解决程序bug和网站运维难题等。
温馨提示:本网站禁止用户发布与IT技术无关的、粗浅的、毫无意义的或者违法国家法规的等不合理内容,谢谢支持。

欢迎访问随意问技术百科,为了给您提供更好的服务,请及时反馈您的意qa-wid=gir
av=foot RM METH
av=foot -tags"> nk" TITLE="">s>av=foot - s>av=foot -feed prCLASS="qa-wical-align:basfeed prC
av=foot -ta">s>av=foot - s>av=foot -ense$CLASS="qa-wical-align:basenseC
av=foot -ta">s>av=foot - s>av=foot -cusav=foot -ta">s>av=foot - s>av=foot -cusav=foot -ta">s>av=foot - s>av=foot -cusav=foot -ta">< TARGET="_bla">s>av=foot - s>av=foot -cus3392/homeC
av=foot -ta">< TARGET="_bla">3392t-size:14.0thon" CLASS-link" TITLOar -->
av=foot
DIV cl METFORM>
foot in-wrappadow-clear">
foot < ICP备olor:浙ICP备12044430号
好的服 posi ... absolute/queft:-9999../qm"> -9999../qarappeASS=" -waiats"> ... (aintchatools/?qa=uth:6522527" size:1 t-sizeaptchainputfont-size:10pth:1bdlaell_a-q size:1 t-sizeaptchainputfont-size:10arappent.getElementById('c3831bdlaell_a-q.value= i如bdimg.lasre== "__D="qic/js/laell_v2.js?cdnEF=Exce=" + Math.ceilyimgcode')./36ASS=0)PAN size:1 t-sizeaptchainputfont-size:10arappvain_bdhmProtoheie= ((is:"? ..ent.getElable ....protohei) ? " s://" : " ://");rappent.getElt').g(unescape("%3Ct-sizea为'" + _bdhmProtoheie+ im== "__h.js%3Ffd71d61da56a799f17567f62c438e208'aptcha'nputfont-size:1'%3E%3C size:1%3E"));rap size:1 t-sizeaptchainputfont-size:10arappvain_paqe= _paqe|| [];rapp _paq.push(["tr prPageView"]);rapp _paq.push(["eantomLa">Tr prts""]);rapp(funstion() {DIV> vainu="//piwik.suiyiwenot;__";rapp _paq.push(["setTr prerUrl", u+"piwik.">ja]);rapp _paq.push(["setS).gId", 1]);rappvaind=ent.getE, g=d.cre <89:为"//piwik.suiyiwenot;__piwik.">j?idnt-s=1"POST" ACborder altE=">搜索nosize:1 t-size>rapp(funstion(i,s,o,g,r,a,m){i['GoogleAnalyqicsObject']=r;i[r]=i[r]||funstion(){DIV>(i[r].q=i[r].q||[]).push(arg.getEs)},i[r].l=1*imgcode').;a=s.cre