我有一个文本文件(领结对齐文件),如下所示:
read_1 + 345995|PACid:16033981 599 AGTAGTAATCAGTCACCCGCAAGGTAGACAAGG qqqqqqqqqqqqqqqqqqqqq!!qqqqqqqqqq 0 read_2 + 949205|PACid:16054220 338 TACCAGCACTAATGCACCGGATCCCATCAGATC qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq!!q 0 31:A>T read_3 + 932004|PACid:16034380 1226 GGCACCTTATGAGAAATCAAAGTTTTTGGGTTC qqqqqqqqqqqqqqq!!qqqqqqqqqqqqq!!q 3
I want to subtract one from Column #4 (the position), and print each line with the updated value.
I can read the file, then separated the fields based on tab, and also identify Column #4 as data[3]
, but then I am stuck with subtracting one from each value in Column #4 and printing all the fields in each line with updated value for Column #4.
How can I do this using Python?
I tried something like this:
in_file = open(sys.argv[1],'r')
out_file = open(sys.argv[2], 'w')
for line in in_file:
data = line.rstrip().split('\t')
position = int(float(data[3]) -1)
但我不确定如何继续打印具有更新位置的行。
最佳答案
使用csv
module ,通知它您的字段分隔符是制表符:
from io import StringIO
indata = StringIO(u"""read_1 + 345995|PACid:16033981 599 AGTAGTAATCAGTCACCCGCAAGGTAGACAAGG qqqqqqqqqqqqqqqqqqqqq!!qqqqqqqqqq 0
read_2 + 949205|PACid:16054220 338 TACCAGCACTAATGCACCGGATCCCATCAGATC qqqqqqqqqqqqqqqqqqqqqqqqqqqqqq!!q 0 31:A>T
read_3 + 932004|PACid:16034380 1226 GGCACCTTATGAGAAATCAAAGTTTTTGGGTTC qqqqqqqqqqqqqqq!!qqqqqqqqqqqqq!!q 3
""")
# that StringIO stuff is just for testing, you should do
# with open('your_file_name', 'r') as indata:
# before the 'for' loop, and then indent the rest one level.
from csv import reader
for line in reader(indata, delimiter='\t'):
if len(line) > 3:
line[3] = str(int(line[3]) - 1)
print '\t'.join(line)
然后只需将位置转换为数字,减一,再转换回来,然后打印该行。
关于python:如何减少和更新分隔文本中字段的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7990471/