python - 从动态电子商务网站中抓取数据

标签 python beautifulsoup python-requests web-mining

我正在尝试废弃电子商务网站(在本例中为 Flipkart)网页上列出的所有产品的标题。现在,我要抓取的产品将取决于用户输入的关键字。如果我输入产品“XYZXYZ”,生成的典型 URL 为:

http://www.flipkart.com/search?q=XYXXYZ&as=off&as-show=on&otracker=start 

现在,使用此链接作为模板,我编写了以下脚本,根据输入的关键字废弃任何给定网页下列出的所有产品的标题:

import requests
from bs4 import BeautifulSoup

def flipp(k):
    url = "http://www.flipkart.com/search?q=" + str(k) + "&as=off&as-show=on&otracker=start"
    ss = requests.get(url)
    src = ss.text
    obj = BeautifulSoup(src)
    for e in obj.findAll("a", {'class' : 'lu-title'}):
        title = e.string
        print unicode(title)

h = raw_input("Enter a keyword:")
print flipp(h)

但是,上述脚本返回 None 作为输出。当我尝试每一步调试时,我发现requests模块无法获取网页的源代码。这里似乎发生了什么?

最佳答案

这样就可以了,

import requests
from bs4 import BeautifulSoup
import re

def flipp(k):
    url = "http://www.flipkart.com/search?q=" + str(k) + "&as=off&as-show=on&otracker=start"
    ss = requests.get(url)
    src = ss.text
    obj = BeautifulSoup(src)
    for e in obj.findAll("a",class_=re.compile("-title")):
        title = e.text
        print title.strip()

h = raw_input("Enter a keyword:") # I used 'Python' here
print flipp(h)

Out[1]:
Think Python (English) (Paperback)
Learning Python (English) 5th  Edition (Hardcover)
Python in Easy Steps : Makes Programming Fun ! (English) 1st Edition (Paperback)
Python : The Complete Reference (English) (Paperback)
Natural Language Processing with Python (English) 1st Edition (Paperback)
Head First Programming: A learner's guide to programming using the Python language (English) 1st  Edition (Paperback)
Beginning Python (English) (Paperback)
Programming Python (English) 4Th Edition (Hardcover)
Computer Science with Python Language Made Simple - (Class XI) (English) (Paperback)
HEAD FIRST PYTHON (English) (Paperback)
Raspberry Pi User Guide (English) (Paperback)
Core Python Applications Programming (English) 3rd  Edition (Paperback)
Write Your First Program (English) (Paperback)
Programming Computer Vision with Python (English) 1st Edition (Paperback)
An Introduction to Python (English) (Paperback)
Fundamentals of Python: Data Structures (English) (Paperback)
Think Complexity (English) (Paperback)
Foundations of Python Network Programming: The comprehensive guide to building network applications with Python (English) 2nd Edition (Soft Cover)
Python Programming for the Absolute Beginner (English) (Paperback)
EXPERT PYTHON PROGRAMMING BEST PRACTICES FOR DESIGNING,CODING & DISTRIBUTING YOUR PYTHON 1st Edition (Paperback)
None

关于python - 从动态电子商务网站中抓取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26080032/

相关文章:

python - 使用 beautifulsoup 4 进行 xml 会导致奇怪的行为(内存问题?)

python - 默认情况下,Python 会限制个人带宽吗?

python - 使用 grequests 发布和获取

python - if/elif/else 语句帮助金钱

python - select() 可以在 Windows 下使用 Python 中的文件吗?

python - 如何下载beautifulsoup包python

python - 在更改下拉列表中的选项时从 URL 不变的站点抓取数据

Python 使用 GET 从 API 检索多页数据

python - 操作数无法与形状一起广播 (780,1080) (780,1080,3)

python - Python 脚本中 "for"循环的进度条