python - 重新排列错位的列

我想问一个数据清理问题，我认为 python 可能更有效。数据有很多错误放置的列，我必须使用基于某些列的一些特征将它们放置到正确的位置。下面是 Stata 代码中的示例:

forvalues i = 20(-1)2{
local j = `i' + 25
local k = `j' - 2
replace v`j' = v`k' if substr(v23, 1, 4) == "1980"
}

也就是说，如果 v23 列中的观察值以“1980”开头，我会将 v25 - v43 列中的内容向后移动 2。否则，列是正确的。

感谢任何帮助。

最佳答案

以下是展示其工作原理的简化示例:

In [65]:
# create some dummy data
import pandas as pd
import io
pd.set_option('display.notebook_repr_html', False)
temp = """v21 v22 v23  v24  v25  v28
1 1 19801923 1 5 8
1 1 20003 1 5 8
1 1 9129389 1 5 8
1 1 1980 1 5 8
1 1 1923 2 5 8
1 1 9128983 1 5 8"""
df = pd.read_csv(io.StringIO(temp),sep='\s+')

df
Out[65]:
   v21  v22       v23  v24  v25  v28
0    1    1  19801923    1    5    8
1    1    1     20003    1    5    8
2    1    1   9129389    1    5    8
3    1    1      1980    1    5    8
4    1    1      1923    2    5    8
5    1    1   9128983    1    5    8

In [68]:
# I have to convert my data to a string in order for this to work, it may not be necessary for you in which case the following commented out line would work for you:
#df.v23.str.startswith('1980')
df.v23.astype(str).str.startswith('1980')
Out[68]:
0     True
1    False
2    False
3     True
4    False
5    False
Name: v23, dtype: bool
In [70]:
# now we can call shift by 2 along the column axis to assign the values back

df.loc[df.v23.astype(str).str.startswith('1980'),['v25','v28']] = df.shift(2,axis=1)
df
Out[70]:
   v21  v22       v23  v24       v25  v28
0    1    1  19801923    1  19801923    1
1    1    1     20003    1         5    8
2    1    1   9129389    1         5    8
3    1    1      1980    1      1980    1
4    1    1      1923    2         5    8
5    1    1   9128983    1         5    8

所以您需要做的是预先定义列列表:

In [72]:

target_cols = ['v' + str(x) for x in range(25,44)]
print(target_cols)
['v25', 'v26', 'v27', 'v28', 'v29', 'v30', 'v31', 'v32', 'v33', 'v34', 'v35', 'v36', 'v37', 'v38', 'v39', 'v40', 'v41', 'v42', 'v43']

现在将其替换回我的方法中，我相信它应该有效:

df.loc[df.v23.astype(str).str.startswith('1980'),target_cols] = df.shift(2,axis=1)

参见shift了解参数

关于python - 重新排列错位的列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25226323/

python - 重新排列错位的列

上一篇：python - Python 中的搜索/匹配正则表达式

下一篇：python - 基于python中的模板 header 合并多个csv文件