我想根据文件的内容对原始文件进行排序,并获取该列中的唯一元素:
原始文件:
qoow_12_xx7_21 wer1 rwty3
asss_x17_211 aqe3 sda4
acyi_112_werxc xcu12 weqa1
qwer_234_ssd aqe3 wers
输出排序数据:
asss_x17_211 aqe3 sda4
qwer_234_ssd aqe3 wers
qoow_12_xx7_21 wer1 rwty3
acyi_112_werxc xcu12 weqa1
输出唯一的col2:
aqe3
wer1
xcu12
我的尝试不起作用代码:
from operator import itemgetter
import itemgetter
def get_unique(data):
seen=""
for e in data:
if e not in seen:
seen="\t".join(seen)
return seen
col2=""
with open("myfile.txt", "r") as infile, open("out.xls","w") as outfile:
for line in infile:
data=line.rstrip.split("\t")
sorted_data=sorted(data, key=lambda e: e.itemgetter)
col2="".join(data[1])
uniq_col2=get_unique(col2)
outfile.write(sorted_data)# tab-delimited sorted data
outfile.write(uniq_col2) # sorted column 2 data
有人可以帮助使此代码正常工作吗?谢谢
最佳答案
试试这个:
from operator import itemgetter
with open('test.txt') as infile, open('out.txt', 'w') as outfile:
# sort input by 2nd column
sorted_lines = sorted(
(line.strip().split() for line in infile),
key=itemgetter(1)
)
# output sorted input
for line in sorted_lines:
outfile.write('\t'.join(line))
outfile.write('\n')
# discard duplicates in already sorted sequence => uniq items
prev_item = None
for item in (line[1] for line in sorted_lines):
if item != prev_item:
prev_item = item
outfile.write(item)
outfile.write('\n')
关于python - 根据列对文件进行排序并获取 uniq 元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27197047/