python - 按字母顺序排序 URL 以下载图像

标签 python sorting

网址排序有问题。 .jpg 文件以“xxxx-xxxx.jpg”结尾。第二组键需要按字母顺序排序。到目前为止,我只能按字母顺序对第一组字符进行排序(这不是必需的)。

例如:

http://code.google.com/edu/languages/google-python-class/images/puzzle/p-babf-bbac.jpg

进行中

http://code.google.com/edu/languages/google-python-class/images/puzzle/p-babh-bajc.jpg

#!/usr/bin/python
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

import os
import re
import sys
import requests

"""Logpuzzle exercise
Given an apache logfile, find the puzzle urls and download the images.

Here's what a puzzle url looks like:
10.254.254.28 - - [06/Aug/2007:00:13:48 -0700] "GET /~foo/puzzle-bar-aaab.jpg HTTP/1.0" 302 528 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"
"""

def url_sort_key(url):
    print url [-8:]
#Extract the puzzle urls from inside a logfile
def read_urls(filename):
    """Returns a list of the puzzle urls from the given log file,
    extracting the hostname from the filename itself.
    Screens out duplicate urls and returns the urls sorted into
    increasing order."""
    # +++your code here+++



# Use open function to search fort the urls containing "puzzle/p"
# Use a line split to pick out the 6th section of the filename
# Sort out all repeated urls, and return sorted list
    with open(filename) as f:
        out = set()
        for line in f:
            if re.search("puzzle/p", line):
                url = "http://code.google.com" + line.split(" ")[6]
                print line.split(" ")
                out.add(url)
    return sorted(list(out))



# Complete the download_images function, which takes a sorted
# list of urls and a directory
def download_images(img_urls, dest_dir):
    """Given the urls already in the correct order, downloads
    each image into the given directory.
    Gives the images local filenames img0, img1, and so on.
    Creates an index.html in the directory
    with an img tag to show each local image file.
    Creates the directory if necessary.
    """
    # ++your code here++
    if not os.path.exists(dest_dir):
        os.makedirs(dest_dir)

    # Create an index
    index = file(os.path.join(dest_dir, 'index.html'), 'w')
    index.write('<html><body>\n')

    i = 0
    for img_url in img_urls:
        i += 1
        local_name = 'img%d' %i
        print "Retrieving...", local_name
        print local_name 
        print dest_dir
        print img_url

        response = requests.get(img_url)
        if response.status_code == 200:
            f = open(os.path.join(dest_dir,local_name + ".jpg"), 'wb')
            f.write(response.content)
            f.close()

        index.write ('<img src="%s">' % (local_name + ".jpg"))


    index.write('\n</body></html>\n')
    index.close()

def main():
    args = sys.argv[1:]

    print args
    if not args:
        print ('usage: [--todir dir] logfile ')
        sys.exit(1)

    todir = None
    if args[0] == '--todir':
        todir = args[1]
        del args[0:2]


    img_urls = read_urls(args[0])

    if todir:
        download_images(img_urls, todir)
    else:
        print ('\n'.join(img_urls))

if __name__ == '__main__':
    main()

我认为错误在于 read_urls 函数的返回,但不是肯定的。

最佳答案

鉴于 url 以格式结尾 xxxx-yyyy.jpg

并且您想根据第二个键对 url 进行排序,即 yyyy

def read_urls(filename):
    with open(filename) as f:
        s = {el.rstrip() for el in f if 'puzzle' in el}
    return sorted(s, key=lambda u: u[-8:-4]) # u[-13:-9] if need to sort on the first key

例如,输入文件包含

http://localhost/p-xxxx-yyyy.jpg
http://code.google.com/edu/languages/google-python-class/images/puzzle/p-babf-bbac.jpg
http://code.google.com/edu/languages/google-python-class/images/puzzle/p-babh-bajc.jpg
http://localhost/p-xxxx-yyyy.jpg

它产生列表

['http://code.google.com/edu/languages/google-python-class/images/puzzle/p-babh-bajc.jpg',
 'http://code.google.com/edu/languages/google-python-class/images/puzzle/p-babf-bbac.jpg']

bajc 出现在 bbac 之前。

看代码中的注释,如果你想按第一个键排序(xxxx)

关于python - 按字母顺序排序 URL 以下载图像,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33530725/

相关文章:

javascript - 使用过滤器和排序将元素放在数组的开头 - javascript

algorithm - 在查找元素之前对数组进行排序是否值得?

c++ - 在 C++ 中排序 vector

c++ - 按字母顺序和字符长度快速排序

Python - 如何找到两个字符串的所有交集?

python - 当 x 轴是分类轴时,如何设置图形上添加的注释的坐标?

c++ - 基数排序算法说明

python kmedoids - 更有效地计算新的 medoid 中心

带有 Tkinter 的 Python 聊天机器人

python - 简单的获取确实很慢