python - 根据子字符串在日志文件中查找特定行 - Python

标签 python regex hadoop

我有以下一 block Hadoop 集群:

==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorage: Storage directory /data/1/dfs/nn has been successfully formatted.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorage: Storage directory /nfsmount/dfs/nn has been successfully formatted.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Saving image file /nfsmount/dfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Saving image file /data/1/dfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Image file of size 115 saved in 0 seconds.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.FSImage: Image file of size 115 saved in 0 seconds.
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
==> namenode_32: 14/11/02 02:19:32 INFO util.ExitUtil: Exiting with status 0
==> namenode_32: 14/11/02 02:19:32 INFO namenode.NameNode: SHUTDOWN_MSG: 
==> namenode_32: /************************************************************
==> namenode_32: SHUTDOWN_MSG: Shutting down NameNode at ip-10-45-129-157.ec2.internal/10.45.129.157
==> namenode_32: ************************************************************/
==> namenode_32:  * Starting Hadoop namenode: 
==> namenode_32: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-ip-10-45-129-157.out
==> namenode_32:  * Starting Hadoop secondarynamenode: 
==> namenode_32: starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-ip-10-45-129-157.out
==> namenode_32:  * Starting Hadoop jobtracker: 
==> namenode_32: starting jobtracker, logging to /var/log/hadoop-0.20-mapreduce/hadoop-hadoop-jobtracker-ip-10-45-129-157.out

我正试图找到此类集群的 ip 地址。我知道 SHUTDOWN_MSG: Shutting down NameNode ... 我正在寻找的是私有(private) DNSprivate ip 的元组。对于那个具体的例子,我得到了:

(ip-10-45-129-157.ec2.internal, 10.45.129.157)

所以我尝试了:

import re
expr = "SHUTDOWN_MSG: Shutting down NameNode at"
s = re.search(expr, log)
>>> print (s.group())
SHUTDOWN_MSG: Shutting down NameNode at

这不是我想要的......我如何使用正则表达式生成这样的元组?

最佳答案

在该搜索字符串之后使用多个捕获组:

>>> expr = 'SHUTDOWN_MSG:.+at (.+)/(.+)'
>>> re.search(expr, log).groups()
('ip-10-45-129-157.ec2.internal', '10.45.129.157')

关于python - 根据子字符串在日志文件中查找特定行 - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26704504/

相关文章:

python - 计算 pandas 数据框中的所有类别并添加它们的值

python - 记录 TensorBoard 2.1 正则化损失的推荐方法是什么

regex - 使用 Perl 提取方括号 "[]"之间的数据

hadoop - hadoop 从哪里获取 JAVA_HOME 变量值?

hadoop - 连接HDFS时控制重试次数的属性有哪些

Python Mysql 查询缓存并在稍后使用它进行连接

python - TensorFlow教程中的next_batch batch_xs, batch_ys = mnist.train.next_batch(100) 从哪里来?

regex - 如何匹配直到最后一次出现 bash shell 中的字符

python - 按复合类名搜索时 BeautifulSoup 返回空列表

java - 在 zookeeper 中创建路径的最有效方法,其中路径的根元素可能存在也可能不存在?