python - 按字符串变量对列重新排序

标签 python bash perl awk

我有一个像这样的 csv 文件:

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97

前 5 列始终相同,后 5 列始终采用不同的顺序。我需要保持前 5 列相同,并对后 5 列重新排序以始终按以下顺序排列:阅读理解、句子技巧、算术、大学水平数学、初等代数

如果其中一个字符串不存在添加一个逗号

所以最终结果应该是这样的:

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53

如果他们总是以相同的顺序我可以做这样的事情:

awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'

如果他们总是至少在同一列中,我可以做一个

awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}

但每一行的顺序都不同,最后有一个数字变量让我陷入循环。

我想用 AWK 来做这件事,但此时我对任何事情都持开放态度。

从逻辑上讲,我认为我需要做类似的事情:j = Reading*, i = Sentence*, k = Arithmetic *, l = College *, m = Elementary *

然后 awk {打印 $6j,$7i,$8k,$9l,$10m}

但是我的谷歌搜索返回了最小的结果。所以即使评论是看这里或寻找这个或检查这个答案......将不胜感激

注意:我已尽力确保输入和输出正确无误。我已经发布了另一个与此类似的问题,但那是当列总是以相同的顺序排列时。所以这是一个不同的请求。

最佳答案

这是一个用 python 编写的简单干净的解决方案。您必须将 input.csvoutput.csv 替换为您的 CSV 文件。

import csv 

labels = [
    "Reading Comprehension", "Sentence Skills", "Arithmetic",
    "College Level Math", "Elementary Algebra"
]

with open('output.csv', 'wb') as outfile, \
     open('input.csv', 'rb') as infile:
    writer = csv.writer(outfile)
    reader = csv.reader(infile) 

    for row in reader: 
        head = row[:5]
        tail = []
        for label in labels:
            tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
        writer.writerow(head + tail)

这是另一个较短的解决方案,它使用管道:

#!/usr/bin/python    
from sys import stdin, stdout

labels = [
    "Reading Comprehension", "Sentence Skills", "Arithmetic",
    "College Level Math", "Elementary Algebra"
]

for line in stdin: 
    values = line.strip().split(',')
    stdout.write(','.join(values[:5]))
    for label in labels:
        stdout.write(',')
        stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
    stdout.write('\n')
stdout.flush()

如果您将此代码保存在一个文件中,例如名为 reorder 的文件,并使此文件可执行,您可以像这样重新格式化您的 CSV 文件:

$ cat input.csv | ./reorder

然后将重新格式化的 csv 内容写入标准输出。

关于python - 按字符串变量对列重新排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31014848/

相关文章:

python - 选择 Selenium 中的焦点元素

python - 如何从字符串中找到不重复的单词?

python - 使用像 DataFrame 这样的点符号访问 Pandas Series 项目

python - sklearn : Have an estimator that filters samples

用于计算 LOC 的 BASH 脚本

linux - 使用 sed 搜索字符串并在行尾但在引号之前添加字符串

bash - 为什么 docker 伪 tty 在通过管道传输到其他命令时会损坏输出?

perl - 为什么 BSD 和 Perlversion 5.12.0 - 5.12.2 的负载测试失败?

python - perl 到 python...我该怎么做?

perl - 如何使用 cpanm 为离线服务器 bundle 模块