我有一个像这样的 csv 文件:
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97
前 5 列始终相同,后 5 列始终采用不同的顺序。我需要保持前 5 列相同,并对后 5 列重新排序以始终按以下顺序排列:阅读理解、句子技巧、算术、大学水平数学、初等代数
如果其中一个字符串不存在添加一个逗号
所以最终结果应该是这样的:
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53
如果他们总是以相同的顺序我可以做这样的事情:
awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'
如果他们总是至少在同一列中,我可以做一个
awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}
但每一行的顺序都不同,最后有一个数字变量让我陷入循环。
我想用 AWK 来做这件事,但此时我对任何事情都持开放态度。
从逻辑上讲,我认为我需要做类似的事情:j = Reading*, i = Sentence*, k = Arithmetic *, l = College *, m = Elementary *
然后 awk {打印 $6j,$7i,$8k,$9l,$10m}
但是我的谷歌搜索返回了最小的结果。所以即使评论是看这里或寻找这个或检查这个答案......将不胜感激
注意:我已尽力确保输入和输出正确无误。我已经发布了另一个与此类似的问题,但那是当列总是以相同的顺序排列时。所以这是一个不同的请求。
最佳答案
这是一个用 python 编写的简单干净的解决方案。您必须将 input.csv
和 output.csv
替换为您的 CSV 文件。
import csv
labels = [
"Reading Comprehension", "Sentence Skills", "Arithmetic",
"College Level Math", "Elementary Algebra"
]
with open('output.csv', 'wb') as outfile, \
open('input.csv', 'rb') as infile:
writer = csv.writer(outfile)
reader = csv.reader(infile)
for row in reader:
head = row[:5]
tail = []
for label in labels:
tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
writer.writerow(head + tail)
这是另一个较短的解决方案,它使用管道:
#!/usr/bin/python
from sys import stdin, stdout
labels = [
"Reading Comprehension", "Sentence Skills", "Arithmetic",
"College Level Math", "Elementary Algebra"
]
for line in stdin:
values = line.strip().split(',')
stdout.write(','.join(values[:5]))
for label in labels:
stdout.write(',')
stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
stdout.write('\n')
stdout.flush()
如果您将此代码保存在一个文件中,例如名为 reorder
的文件,并使此文件可执行,您可以像这样重新格式化您的 CSV 文件:
$ cat input.csv | ./reorder
然后将重新格式化的 csv 内容写入标准输出。
关于python - 按字符串变量对列重新排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31014848/