python - 如何以更高效和 pythonic 的方式编写以下代码？

我有一个包含 url 的列表:file_url_list，打印为:

www.latimes.com, www.facebook.com, affinitweet.com, ...

还有另一个 Top 1M url 列表:top_url_list，打印如下:

[1, google.com], [2, www.google.com], [3, microsoft.com], ...

我想找出 file_url_list 中有多少个 URL 在 top_url_list 中。我已经编写了以下有效的代码，但我知道这不是最快的方法，也不是最 pythonic 的方法。

# Find the common occurrences
found = []
for file_item in file_url_list:
    for top_item in top_url_list:
        if file_item == top_item[1]:
            # When you find an occurrence, put it in a list
            found.append(top_item)

我怎样才能以更高效和 pythonic 的方式编写它？

最佳答案

设置交集应该有所帮助。此外，您可以使用生成器表达式仅从 top_url_list 中的每个条目中提取 url。

file_url_list = ['www.latimes.com', 'www.facebook.com', 'affinitweet.com']
top_url_list = [[1, 'google.com'], [2, 'www.google.com'], [3, 'microsoft.com']]

common_urls = set(file_url_list) & set(url for (index, url) in top_url_list)

或等同地感谢Jean-François Fabre :

common_urls = set(file_url_list) & {url for (index, url) in top_url_list}

关于python - 如何以更高效和 pythonic 的方式编写以下代码？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43652633/

上一篇：python - 在 pandas/matplotlib/seaborn python 中使用自定义配色方案

下一篇：python - 调整图中的 y-lim 比例(matplotlib、pandas)以实现两个图的相同比例

相关文章：

python - 将当前对象传递给 python apscheduler 方法

python - 检索满足某些条件的字典

mysql - 在 mysql datetime 和 python timestamp 之间转换的正确方法是什么？

python - 检查环境变量是否存在并设置为 True

python - 混洗范围迭代器

python - 将发散颜色居中至零

Python:在哪里放置logging.getLogger

java - 如何打印docx和pdf？

python - 在 matplotlib 中更改图形大小和图形格式

python - 更改字体样式，同时保留用于绘图仪表板的外部样式表