python - 如何有效地将特定号码匹配到号码集中？

我有一个数字集，其中包含 txt 文件中的 2375013 个唯一数字。数据结构如下所示:

我想将另一个数据中的一行中的数字与设置的数字相匹配，以提取我需要的数据。所以，我这样编码:

   6 def get_US_users_IDs(filepath, mode):
   7     IDs = []
   8     with open(filepath, mode) as f:
   9         for line in f:
  10             sp = line.strip()
  11             for id in sp:
  12                 IDs.append(id.lower())
  13         return IDs


  75         IDs = "|".join(get_US_users_IDs('/nas/USAuserlist.txt', 'r'))
  76         matcher = re.compile(IDs)
  77         if matcher.match(user_id):
  78             number_of_US_user += 1
  79             text = tweet.split('\t')[3]

但是运行起来需要很多时间。有什么办法可以减少运行时间吗？

最佳答案

我的理解是，您在一个文件中有大量的 id，并且您想知道该文件中是否存在特定的 user_id。

您可以使用Python集。

fd = open(filepath, mode);
IDs = set(int(id) for id in fd)
...
if user_id in IDs:
  number_of_US_user += 1
  ...

关于python - 如何有效地将特定号码匹配到号码集中？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7696739/

上一篇：python - 如何在 Python 中用 0 替换文本表中的空白条目？

下一篇：python - 如何在views.py中添加从urls.py中删除url

相关文章：

Python正则表达式替换 anchor

python - 从Python中的日期时间对象中删除毫秒

python - 列表中仅给定索引与另一个整数的总和

Python JSON 从响应中获取数据

python - Spyder、Python IDE 启动代码导致 GUI 崩溃

python - 在没有 Pandas 的情况下对多列进行分组和求和

python - 停止声音设备音频输出中的循环

python - 无法为 AWS Lambda 导入 grequests

python - 字符串切片、迭代和列表问题

Python 正则表达式 : words between two delimiters - replace leading delimiters with punctuation, 但删除结尾的