linux - : regexp and file handling左右的Shell程序代码

标签 linux shell

我正在 shell 中编写这个小程序:

#!/bin/bash

#***************************************************************
# Synopsis:
# Read from an inputfile each line, which has the following format:
#
# llnnn nnnnnnnnnnnnllll STRING lnnnlll n nnnn nnnnnnnnn nnnnnnnnnnnnnnnnnnnn ll ll   
#
# where:
# n is a <positive int>
# l is a <char> (no special chars)
# the last set of ll ll  could be:
#   - NV 
#   - PV 
#
# Ex:
# AVO01  000060229651AVON FOOD OF ARKHAM C A  S060GER   0  1110  000000022  00031433680006534689  NV  PV
#
# The program should check, for each line of the file, the following:
# I) If the nnn of character llnnn (beggining the line) is numeric,
#    this is, <int>
# II) If the character ll ll is NV (just one set of ll) then
#    copy that line in an outputfile, and add one to a counter. 
# III) If the character ll ll is NP (just one set of ll) then
#     copy that line in an outputfile, and add one to a counter.
# 
# NOTICE: could be just one ll. Ex: [...] NV [...]
#                                   [...] PV [...] 
#         or both Ex: [...] NV PV [...] 
#
#
# Execution (after generating the executable):
# ./ inputfile outputfileNOM outputfilePGP
#***************************************************************


# Check the number of arguments that could be passed.
if [[ ${#@} != 3 ]]; then
        echo "Error...must be: myShellprogram <inputfile> <outputfileNOM> <outputfilePGP>\n"
    exit
fi  

#Inputfile: is in position 1 on the ARGS
inputfile=$1 
#OutputfileNOM: is in position 2 on the ARGS
outputfileNOM=$2
#OutputfilePGP: is in position 3 on the ARGS
outputfilePGP=$3

#Main variables. Change if needed. 
# Flags the could appear in the <inputfile>
#
# ATTENTION!!!: notice that there is a white space
# before the characters, this is important when using
# the regular expression in the conditional:
# if [[  $line =~ $NOM ]]; then [...] 
#
# If the white space is NOT there it would match things like:
# ABCNV ... which is wrong!!
NOM=" NV"
PGP=" PV"
#Counters of ocurrences
countNOM=0;
countPGP=0;


#Check if the files exists and have the write/read permissions
if [[ -r $inputfile && -w $outputfileNOM && -w $outputfilePGP ]]; then
    #Read all the lines of the file.
    while read -r line  
        do
            code=${line:3:2} #Store the code (the nnn) of the "llnnn" char set of the inputfile

            #Check if the code is numeric
            if [[ $code =~ ^[0-9]+$ ]] ; then

                #Check if the actual line has the NOM flag
                if [[  $line =~ $NOM ]]; then
                    echo "$line" >> "$outputfileNOM"
                    (( ++countNOM ))
                fi  

                #Check if the actual line has the PGP flag
                if [[  $line =~ $PGP ]]; then
                    echo "$line" >> "$outputfilePGP"
                    (( ++countPGP ))
                fi

            else
              echo "$code is not numeric"
              exit  

            fi      

        done < "$inputfile"

    echo "COUN NON $countNOM"       
    echo "COUN PGP $countPGP"
else
    echo "FILE: $inputfile does not exist or does not have read permissions"
    echo "FILE: $outputfileNOM does not exist or does not have write permissions"
    echo "FILE: $outputfilePGP does not exist or does not have write permissions"
fi  

我有一些问题:

我)当我这样做时:

 if [[ -r $inputfile && -w $outputfileNOM && -w $outputfilePGP ]]; then
 [...]
 else
     echo "FILE: $inputfile does not exist or does not have read permissions"
     echo "FILE: $outputfileNOM does not exist or does not have write permissions"
     echo "FILE: $outputfilePGP does not exist or does not have write permissions"
 fi

我想打印其他内容,因此,这是打印正确的消息。例如:如果“$outputfileNOM”没有写入权限,则打印该错误。但是,我不想放很多 if/else,例如:

if [[ -r $inputfile ]]; then
[...]
if  [[-w $outputfileNOM ]] then 
[...]
else
  For the READ permission, and the other else for the WRITE

有没有一种方法可以做到这一点,使用嵌套方法,并且保持可读性。

II)关于:

 if [[ -r $inputfile && -w $outputfileNOM && -w $outputfilePGP ]]

如果我使用标志“-x”而不是-r或-w,就可以了。我没有明确的定义:

-x FILE
          FILE exists and execute (or search) permission is granted

III)注意我的代码中的“注意”标签。我注意到有一些可能性,例如:之前、之后或之前或之后有空格。我相信输入文件的一致性,但如果它们发生变化,它就会爆炸。在这种情况下我能做什么呢?有没有一种优雅的方式来管理它? (异常(exception)?)

非常感谢!

最佳答案

我以前就被=~操作符咬过。

原则上我会告诉你引用这个论点(即... =~ "$NOM"),但是 starting with bash 3.2 there is a special behavior=~ ""。链接 () 相当冗长地说:

o Quoting the string argument to the [[ command's =~ (regexp) operator now forces string matching, as with the other pattern-matching operators.

E14) Why does quoting the pattern argument to the regular expression matching conditional operator (=~) cause regexp matching to stop working?

In versions of bash prior to bash-3.2, the effect of quoting the regular expression argument to the [[ command's =~ operator was not specified. The practical effect was that double-quoting the pattern argument required backslashes to quote special pattern characters, which interfered with the backslash processing performed by double-quoted word expansion and was inconsistent with how the == shell pattern matching operator treated quoted characters.

In bash-3.2, the shell was changed to internally quote characters in single- and double-quoted string arguments to the =~ operator, which suppresses the special meaning of the characters special to regular expression processing (.',[', \',(', ),*', +',?', {',|', ^', and$') and forces them to be matched literally. This is consistent with how the `==' pattern matching operator treats quoted portions of its pattern argument.

Since the treatment of quoted string arguments was changed, several issues have arisen, chief among them the problem of white space in pattern arguments and the differing treatment of quoted strings between bash-3.1 and bash-3.2. Both problems may be solved by using a shell variable to hold the pattern. Since word splitting is not performed when expanding shell variables in all operands of the [[ command, this allows users to quote patterns as they wish when assigning the variable, then expand the values to a single string that may contain whitespace. The first problem may be solved by using backslashes or any other quoting mechanism to escape the white space in the patterns.

您可能会考虑类似于 NOM="[ ]NV" 的内容。 (请注意,我尚未对此进行测试。)

关于linux - : regexp and file handling左右的Shell程序代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7700749/

相关文章:

python - 将字符串结构转换为另一个字符串结构

c - 在后台执行命令

linux - 通过 O_DIRECT 读取是否首先刷新脏页?

c - Linux:处理 C 中的空参数

linux - 向 Grub2 添加命令

windows - NMake 在使用 $(shell) 时返回错误

linux - SO_BINDTODEVICE 虚拟接口(interface)失败

linux - bash 脚本计算文件夹中有多少个常规文件和多少个目录

linux - 使用 bash 通配符定位行为

linux - 如何在shell脚本中调用./etc/bash_completion?