shell - shell脚本查找字符串的第一个和最后一次出现

标签 shell hadoop

我已经准备了一个Shell脚本,以在50个节点的hadoop集群中执行以下操作:

  • 列出每个服务器中与我的应用程序相关的所有日志文件
  • 打印最后修改的时间戳,主机名,文件名
  • 根据修改的时间戳记
  • 对来自50个节点的日志文件进行排序

    当前输出格式为:
    2016-07-11-01:06 server1 MY_APPLICATION-worker-6701.log.6.gz
    2016-07-12-05:23 server1 MY_APPLICATION-worker-6701.log.7.gz
    2016-07-13-08:38 server2 MY_APPLICATION-worker-6701.log
    2016-07-13-10:38 server3 MY_APPLICATION-worker-6701.log.out
    2016-07-13-10:38 server2 MY_APPLICATION-worker-6701.log.err
    2016-07-13-10:38 server5 MY_APPLICATION-worker-6701.log
    2016-07-15-10:22 server4 MY_APPLICATION-worker-6703.log.out
    2016-07-15-10:22 server3 MY_APPLICATION-worker-6703.log.err
    2016-07-15-10:22 server2 MY_APPLICATION-worker-6703.log
    


    totallogs=""
    for server in $(cat all-hadoop-cluster-servers.txt); do
        logs1="$(ssh user_id@$server 'ls /var/log/hadoop/storm/ -ltr --time-style="+%Y-%m-%d-%H:%M" | grep MY_APPLICATION | awk  -v host=$HOSTNAME "{print \$6, host, \$7}"' )"
        if [ -z "${logs1}"  ]; then
            continue
        else
            logs1+="\n"
            totallogs+=$logs1
        fi  
    done
    for el in "${totallogs[@]}"
    do
        printf "$el"
    done | sort
    

    如何在每个日志文件中查找首次出现的“ unique-ID ”和最后一次出现的“ unique-ID ”以及上述输出。

    预期的输出格式为:

    time_stamp主机名文件名first-unique-ID last-unique-id
    2016-07-11-01:06 server1 MY_APPLICATION-worker-6701.log.6.gz    1467005065878   1467105065877
    2016-07-12-05:23 server1 MY_APPLICATION-worker-6701.log.7.gz    1467105065878   1467205065860
    2016-07-13-08:38 server2 MY_APPLICATION-worker-6701.log         1467205065861   1467305065852
    2016-07-13-10:38 server3 MY_APPLICATION-worker-6701.log.out     
    2016-07-13-10:38 server2 MY_APPLICATION-worker-6701.log.err     
    2016-07-13-10:38 server5 MY_APPLICATION-worker-6701.log         1467305065853   1467405065844
    2016-07-15-10:22 server4 MY_APPLICATION-worker-6703.log.out     
    2016-07-15-10:22 server3 MY_APPLICATION-worker-6703.log.err     
    2016-07-15-10:22 server2 MY_APPLICATION-worker-6703.log         1467405065845   1467505065853
    

    样本日志文件:
    DEBUG | 2008-09-06 10:51:44,848 | unique-ID >>>>>> 1467205065861
    DEBUG | 2008-09-06 10:51:44,817 | DefaultBeanDefinitionDocumentReader.java | 86 | Loading bean definitions
    DEBUG | 2008-09-06 10:51:44,848 | AbstractBeanDefinitionReader.java | 185 | Loaded 5 bean definitions from location pattern [samContext.xml]
    INFO | 2008-09-06 10:51:44,848 | XmlBeanDefinitionReader.java | 323 | Loading XML bean definitions from class path resource [tmfContext.xml]
    DEBUG | 2008-09-06 10:51:44,848 | DefaultDocumentLoader.java | 72 | Using JAXP provider [com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl]
    DEBUG | 2008-09-06 10:51:44,848 | BeansDtdResolver.java | 72 | Found beans DTD [http://www.springframework.org/dtd/spring-beans.dtd] in classpath: spring-beans.dtd
    DEBUG | 2008-09-06 10:51:44,848 | unique-ID >>>>>> 1467205065862
    DEBUG | 2008-09-06 10:51:44,864 | DefaultBeanDefinitionDocumentReader.java | 86 | Loading bean definitions
    DEBUG | 2008-09-06 10:51:45,458 | AbstractAutowireCapableBeanFactory.java | 411 | Finished creating instance of bean 'MS-SQL'
    DEBUG | 2008-09-06 10:51:45,458 | DefaultSingletonBeanRegistry.java | 213 | Creating shared instance of singleton bean 'MySQL'
    DEBUG | 2008-09-06 10:51:45,458 | AbstractAutowireCapableBeanFactory.java | 383 | Creating instance of bean 'MySQL'
    DEBUG | 2008-09-06 10:51:45,458 | AbstractAutowireCapableBeanFactory.java | 459 | Eagerly caching bean 'MySQL' to allow for resolving potential circular references
    DEBUG | 2008-09-06 10:51:45,458 | AbstractAutowireCapableBeanFactory.java | 411 | Finished creating instance of bean 'MySQL'
    DEBUG | 2008-09-06 10:51:45,458 | DefaultSingletonBeanRegistry.java | 213 | Creating shared instance of singleton bean 'Oracle'
    DEBUG | 2008-09-06 10:51:45,458 | AbstractAutowireCapableBeanFactory.java | 383 | Creating instance of bean 'Oracle'
    DEBUG | 2008-09-06 10:51:45,458 | AbstractAutowireCapableBeanFactory.java | 459 | Eagerly caching bean 'Oracle' to allow for resolving potential circular references
    DEBUG | 2008-09-06 10:51:45,473 | AbstractAutowireCapableBeanFactory.java | 411 | Finished creating instance of bean 'Oracle'
    DEBUG | 2008-09-06 10:51:45,473 | DefaultSingletonBeanRegistry.java | 213 | Creating shared instance of singleton bean 'PostgreSQL'
    DEBUG | 2008-09-06 10:51:45,473 | AbstractAutowireCapableBeanFactory.java | 383 | Creating instance of bean 'PostgreSQL'
    DEBUG | 2008-09-06 10:51:45,473 | AbstractAutowireCapableBeanFactory.java | 459 | Eagerly caching bean 'PostgreSQL' to allow for resolving potential circular references
    DEBUG | 2008-09-06 10:51:45,473 | AbstractAutowireCapableBeanFactory.java | 411 | Finished creating instance of bean 'PostgreSQL'
    INFO | 2008-09-06 10:51:45,473 | SQLErrorCodesFactory.java | 128 | SQLErrorCodes loaded: [DB2, Derby, H2, HSQL, Informix, MS-SQL, MySQL, Oracle, PostgreSQL, Sybase]
    DEBUG | 2008-09-06 10:52:44,817 | DefaultBeanDefinitionDocumentReader.java | 86 | Loading bean definitions
    DEBUG | 2008-09-06 10:52:44,848 | unique-ID >>>>>> 1467205065864
    

    最佳答案

    grep 'uniqueID' sample_log_file | sed -n '1p;$p'
    

    关于shell - shell脚本查找字符串的第一个和最后一次出现,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38440566/

    相关文章:

    java - 多个目录作为 hadoop map reduce 中的输入格式

    linux - 在 bash 脚本中选择命令行或菜单

    检查输入是否是有效的 shell 命令,Linux

    mysql - 创建 Hive 表 - 如何从 CSV 源中导出列名?

    java - 无法停止Hadoop IPC服务

    hadoop - 使用命令行(CDH 5)启动Hadoop服务

    hadoop - 运行 hadoop 示例代码时出错

    shell 脚本 : get a specified range of parameters

    shell - 查找并重命名双引号包含目录中的文件

    mysql - 使用 .my.cnf 时如何参数化 mysql 登录