unix - 使用大于数字的列值过滤文件(awk 不起作用)

标签 unix awk

我正在尝试使用第 8 列 >= 10 中的值来过滤文件。我正在使用 awk,但由于某种原因它不起作用。我做错了什么吗?我错过了什么?

head df_TPM.csv
LQNS02136402.1_14821_3p,12680.71611,11346.42368,11686.28693,9067.797819,7429.467928,5551.660333,3246.956281
LQNS02000137.1_325_3p,8342.540984,5905.726173,4503.363041,3616.191278,3142.965662,3678.829299,6288.621969
LQNS02278148.1_40791_3p,4921.502758,2461.882836,429.824973,261.273116,132.0239748,68.6191655,70.8815385
LQNS02278089.1_34112_3p,4246.71324,4584.529009,8687.922574,7570.83746,5801.384953,2870.020801,734.3131465
LQNS02278075.1_32377_5p,4143.547577,4093.91803,10804.12323,10062.99269,7925.240969,4712.484455,1080.915573
LQNS02138569.1_14892_3p,2668.27957,2160.173542,837.2584183,233.2310273,84.62362925,64.6037895,23.456714
LQNS02278075.1_32324_5p,2331.608924,491.8868983,1527.312199,881.8683105,747.1474225,347.397634,74.07259175
LQNS02278075.1_32382_3p,2140.686095,2439.122353,10837.38169,12569.95295,9385.530878,6022.323737,1705.900969
LQNS02000138.1_777_5p,1819.275149,1762.009649,8565.396754,33280.90019,32176.07604,15849.37306,11872.99383
LQNS02278186.1_47223_3p,1687.843418,728.4288968,1328.048172,1306.424238,2102.27342,14.78892225,9.92647375

#Extract column 1 and 8 and print if $8>=10
cat df_TPM.csv |awk -F"," '{print $1, $8}' | grep -E "^LQN" | awk -F " " '$2>= 10'

LQNS02276925.1_23356_5p 5.352369
LQNS02277221.1_25158_5p 2.82778125
LQNS02277812.1_29775_3p 11.1090745
LQNS02278074.1_32154_3p 6.124789
LQNS02278139.1_39525_5p 22.6656355

#As you can see lots of numbers shouldn't be there (ex: 2.82778125 < 10)

最佳答案

通过查看OP的评论,如果您不想在行首搜索LQN文本并想检查第8列是否大于10,请尝试以下操作(检查行是否让 LQN 从以下代码中删除 !)。

awk -F"," '$8+0 >= 10 && !/^LQN/{print $1, $8}' df_TPM.csv

或者要获取总行数,请尝试:计算那些匹配的行数可以在单个 awk 本身中完成。

awk -F"," '$8+0 >= 10 && !/^LQN/{count++} END{print count}' df_TPM.csv

说明:为上述内容添加详细说明。

awk -F"," '               ##Starting awk program from here.
$8+0 >= 10 && !/^LQN/{    ##Checking condition if 8th field is greater than 10 and NOT LQN.
  count++                 ##Increasing count with 1 here.
}
END{                      ##Starting END block of this awk program from here.
  print count             ##Printing count value here.
}
' df_TPM.csv              ##Mentioning Input_file name here.

要在 awk 代码本身中处理控制 M 字符,请尝试:考虑到您不希望在 Input_file 中包含控制 M 字符。

awk -F"," '{gsub(/\r/,"")} $8 >= 10 && !/^LQN/{count++} END{print count}' df_TPM.csv

关于unix - 使用大于数字的列值过滤文件(awk 不起作用),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65522666/

相关文章:

python - siginterrupt() 只适用于第一个信号? (Python)

linux - Unix:列出具有特定结尾的文件并显示它们的大小和日期

c - Minix:取消定义对 __fts_open60 的引用

c - 实现系统调用而不显式调用它们

regex - 如何在 linux shell 中使用 sed 选择内容

Bash - 使用循环打印具有特定列的所有行

linux - 用值 0 替换特定列中的最后一个字符

xml - 如何从所有 .xml 文件中删除所有包含字符串的行?

awk - 如何仅使用一个命令在文本文件中查找 IPv4 地址和子网的所有匹配项并使用给定前缀逐行打印匹配项?

shell - 用递增的值替换字符串