Python爬虫获取豆瓣电影并写入excel

站长资源 2025/1/8 佚名

9 1538 9

DDR爱好者之家 Design By 杰米

豆瓣电影排行榜前250 分为10页，第一页的url为https://movie.douban.com/top250,但实际上应该是https://movie.douban.com/top250"_blank" href="https://movie.douban.com/top250" rel="external nofollow" >https://movie.douban.com/top250"codetitle">复制代码代码如下:for i in range(0, 250, 25): print(i)

分析完页面组成后，开始获取页面，直接request.get()发现没有返回任何东西，输出一下响应码

url = 'https://movie.douban.com/top250"text-align: center">




import requests
import lxml.etree as etree
url = 'https://movie.douban.com/top250"/html/body/div[3]/div[1]/div/div[1]/ol/li[1]/div/div[2]/div[1]/a/span[1]")
print(name)


但是直接这样子，解析到的是这样的结果
[<Element span at 0x20b2f0cc488>]

关于这东西是什么，有文章写的很好：https://www.jb51.net/article/132145.htm
这里我直接写解决部分，在使用xpath解析时，后面加上/text()
name = html.xpath("/html/body/div[3]/div[1]/div/div[1]/ol/li[1]/div/div[2]/div[1]/a/span[1]/text()")
解决后，再使用xpath finder插件，一步一步获取到电影所有数据
最后把这个写在函数里，外面再套上一开始说的循环，就OK了


 # -*- coding: utf-8 -*-

import requests
import lxml.etree as etree


def get_source(page):
  url = 'https://movie.douban.com/top250"/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[1]/a/span[1]/text()".format(i))
    info = html.xpath("/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[2]/p[1]/text()".format(i))
    score = html.xpath(
      "/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[2]/div/span[2]/text()".format(i))
    slogan = html.xpath(
      "/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[2]/p[2]/span/text()".format(i))
    print(name[0])
    print(info[0].replace(' ', ''))
    print(info[1].replace(' ', ''))
    print(score[0])
    print(slogan[0])


n = 1
for i in range(0, 250, 25):
  print('第%d页' % n)
  n += 1
  get_source(i)
  print('==========================================')


在定位时，发现有4部电影介绍没有slogan，导致获取到的信息为空列表，也就导致了list.append()会出错。所以我加上了几个差错处理，解决方式可能有点傻，如果有更好的解决办法，洗耳恭听
代码在最后可以看到

EXCEL保存部分
这里我用的xlwt

book = xlwt.Workbook()
sheet = book.add_sheet(u'sheetname', cell_overwrite_ok=True)

创建一个sheet表单。
数据保存到一个大列表中，列表嵌套列表
再通过循环把数据导入到excel表单中


r = 1
  for i in LIST: #有10页
    for j in i:  #有25条数据
      c = 2
      for x in j:    #有5组数据
        print(x)
        sheet.write(r, c, x)
        c += 1
      r += 1


最后在保存一下
book.save(r'douban.xls')
注意文件后缀要用xls，用xlsx会导致文件打不开
然后就大功告成了
打开文件，手动加入排名，等部分信息（这些也可以在程序里完成，我嫌麻烦，就没写，直接手动来的快）

前面的"htmlcode">

# -*- coding: utf-8 -*-

import requests
import lxml.etree as etree
import xlwt

def get_source(page):
  List = []
  url = 'https://movie.douban.com/top250"/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[1]/a/span[1]/text()".format(i))
    info = html.xpath("/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[2]/p[1]/text()".format(i))
    score = html.xpath(
      "/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[2]/div/span[2]/text()".format(i))
    slogan = html.xpath(
      "/html/body/div[3]/div[1]/div/div[1]/ol/li[{}]/div/div[2]/div[2]/p[2]/span/text()".format(i))
    try:
      list.append(name[0])
    except:
      list.append('----')
    try:
      list.append(info[0].replace(' ', '').replace('\n', ''))
    except:
      list.append('----')
    try:
      list.append(info[1].replace(' ', '').replace('\n', ''))
    except:
      list.append('----')
    try:
      list.append(score[0])
    except:
      list.append('----')
    try:
      list.append(slogan[0])
    except:
      list.append('----')

    List.append(list)

  return List


n = 1
LIST = []
for i in range(0, 250, 25):
  print('第{}页'.format(n))
  n += 1
  List = get_source(i)
  LIST.append(List)


def excel_write(LIST):
  book = xlwt.Workbook()
  sheet = book.add_sheet(u'sheetname', cell_overwrite_ok=True)
  r = 1
  for i in LIST: #有10页
    for j in i:  #有25条数据
      c = 2
      for x in j:    #有5组数据
        print(x)
        sheet.write(r, c, x)
        c += 1
      r += 1

  book.save(r'douban1.xls')  #保存代码
excel_write(LIST)  


以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持。


                                
                                    python,爬虫,豆瓣电影,写入excel 
                                DDR爱好者之家 Design By 杰米


                        
                            
                                广告合作：本站广告合作请联系QQ：858582 申请时备注：广告合作（否则不回）

                                免责声明：本站资源来自互联网收集,仅供用于学习和交流,请遵循相关法律法规,本站一切资源不代表本站立场,如有侵权、后门、不妥请联系本站删除！
                            
                        
                        
                            
                                
                                    上一篇
                                    python 使用elasticsearch 实现翻页的三种方式
                                
                            
                            
                                
                                    下一篇
                                    深入浅析Python代码规范性检测
                                
                            
                        
                        
                        DDR爱好者之家 Design By 杰米
                        
                            
                                
                                
                                    评论“Python爬虫获取豆瓣电影并写入excel”
                                
                            
                            
                                
                                    
                                        
                                            
                                                
                                                    
                                                
                                                
                                                    
                                                
                                                
                                                    
                                                    
                                                    
                                                
                                                
                                                     再想想
                                                    
                                                    
                                                    
                                                    
                                                    
                                                
                                            
                                            
                                        
                                    
                                    
                                    
                                        暂无评论...


                    
                        
                            
                                
                                    
                                        
                                    
                                    
                                        
                                            
                                        
                                    
                                
                                
                                    ddrfans.com
                                            
                                                DDR爱好者之家 
                                    
                                    
                                        
                                            
                                        
                                        
                                            
                                        
                                        
                                            
                                        
                                        
                                            
                                        
                                    
                                    
                                    
                                        
                                            8,675无损音乐
                                        
                                        
                                            1,324高清电影
                                        
                                        
                                            213破解软件
                                        
                                        
                                            70,141收录资讯
                                        
                                    
                                
                            
                            
                                最新文章
                                
                                    
                                         
                                       
                                            
                                                
                                            
                                            
                                                
                                                    群星《奔赴！万人现场 第2期》[FLAC/分轨][5
                                                
                                                
                                                    
                                                        2025/1/8
                                                        
                                                         43
                                                    
                                                
                                            
                                        
 
                                       
                                            
                                                
                                            
                                            
                                                
                                                    群星《奇妙浪一夏 (上海迪士尼度假区音乐)》
                                                
                                                
                                                    
                                                        2025/1/8
                                                        
                                                         32
                                                    
                                                
                                            
                                        
 
                                       
                                            
                                                
                                            
                                            
                                                
                                                    群星《奇妙浪一夏 (上海迪士尼度假区音乐)》
                                                
                                                
                                                    
                                                        2025/1/8
                                                        
                                                         61
                                                    
                                                
                                            
                                        
 
                                       
                                            
                                                
                                            
                                            
                                                
                                                    【古典音乐】詹姆斯·高威《季节》1993[WAV+
                                                
                                                
                                                    
                                                        2025/1/8
                                                        
                                                         24
                                                    
                                                
                                            
                                        
 
                                       
                                            
                                                
                                            
                                            
                                                
                                                    贝拉芳蒂《卡里普索之王》SACD[WAV+CUE]
                                                
                                                
                                                    
                                                        2025/1/8
                                                        
                                                         74
                                                    
                                                
                                            
                                        


                                    
                                
                            
                            
                                站点导航
                                
                                    
                                        抖音极速版
                                        河马剧场
                                        京东
                                        小红书
                                        微信
                                        高德地图
                                        红果短剧
                                        夸克
                                        美团
                                        剪映
                                        拼多多
                                        支付宝
                                        淘宝
                                        快手
                                        QQ
                                        哔哩哔哩
                                        番茄小说
                                        得物
                                        阿里巴巴
                                        王者荣耀
                                        和平精英
                                        腾讯视频
                                        爱奇艺
                                        QQ音乐
                                        咸鱼之王
                                        逆水寒
                                        三国志战略版
                                        梦幻西游
                                        金铲铲之战
                                        捕鱼大作战
                                        原神
                                        英雄联盟手游
                                        网易云音乐
                                        崩坏星穹铁道
                                        优酷视屏
                                        酷狗音乐
                                        蛋仔派对

Python爬虫获取豆瓣电影并写入excel

python 使用elasticsearch 实现翻页的三种方式

深入浅析Python代码规范性检测

评论“Python爬虫获取豆瓣电影并写入excel”

更新日志

友情链接