python - reshape DataFrame 并根据其他 24 列修改一列

我有一个有 26 列的 Excel。

日期、唯一 ID 和 H01、H02、H03.. .H24

此处 H{n} 代表小时，即对于 19/7/2017 01.00.00 的 UID some_code，其值为 199 。 19/7/2017 02.00.00 的值为 7 等。

+--------------------+---------------+----------+---------------+
|       Date         | UID           | H01      | H02           |
+--------------------+---------------+----------+---------------+
| 19/7/2017 00.00.00 | some_code     |      199 |             7 |
| 19/7/2017 00.00.00 | another_code  |      164 |            18 |
| 19/7/2017 00.00.00 | new_code      |      209 |             1 |
| 19/7/2017 00.00.00 | code_5        |       85 |             4 |
| 19/7/2017 00.00.00 | what          |       45 |             6 |

我正在阅读 Excel 并创建一个与上面类似的 DataFrame。

我想修改这个 DataFrame，以便得到下面的结果。

+--------------------+---------------+----------+
|       Date         | UID           | Value    |
+--------------------+---------------+----------+
| 19/7/2017 01.00.00 | some_code     |      199 |
| 19/7/2017 02.00.00 | some_code     |        7 |
| 19/7/2017 03.00.00 | some_code     |      ... |
.................................................
.................................................
| 19/7/2017 00.00.00 | some_code     |      ... |
| 19/7/2017 01.00.00 | another_code  |      164 |
| 19/7/2017 02.00.00 | another_code  |       18 |
| 19/7/2017 03.00.00 | another_code  |       ...|
.................................................
.................................................
| 19/7/2017 00.00.00 | another_code  |       ...|

我是 Python 和 Pandas 新手，无法理解 stack/unstack/pivot。

最佳答案

您可以使用:

首先转换日期 to_datetime
通过set_index创建MultiIndex - 所有其他列都是 H 列
extract数字并转换为 to_timedelta
reshape stack
将带有 timedeltas 的列添加到日期并按 drop 删除它

<小时/>

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y %H.%M.%S')
df = df.set_index(['Date','UID'])
df.columns=pd.to_timedelta(df.columns.str.extract('(\d+)',expand=False).astype(int),unit='H')
df = df.stack().reset_index(name='Value')
df['Date'] = df['Date'] + df['level_2']
df = df.drop('level_2', axis=1)
print (df)
                 Date           UID  Value
0 2017-07-19 01:00:00     some_code    199
1 2017-07-19 02:00:00     some_code      7
2 2017-07-19 01:00:00  another_code    164
3 2017-07-19 02:00:00  another_code     18
4 2017-07-19 01:00:00      new_code    209
5 2017-07-19 02:00:00      new_code      1
6 2017-07-19 01:00:00        code_5     85
7 2017-07-19 02:00:00        code_5      4
8 2017-07-19 01:00:00          what     45
9 2017-07-19 02:00:00          what      6

对于相同格式的日期，请添加 dt.strftime :

...
df['Date'] = (df['Date'] + df['level_2']).dt.strftime('%d/%m/%Y %H.%M.%S')
df = df.drop('level_2', axis=1)
print (df)
                  Date           UID  Value
0  19/07/2017 01.00.00     some_code    199
1  19/07/2017 02.00.00     some_code      7
2  19/07/2017 01.00.00  another_code    164
3  19/07/2017 02.00.00  another_code     18
4  19/07/2017 01.00.00      new_code    209
5  19/07/2017 02.00.00      new_code      1
6  19/07/2017 01.00.00        code_5     85
7  19/07/2017 02.00.00        code_5      4
8  19/07/2017 01.00.00          what     45
9  19/07/2017 02.00.00          what      6

关于python - reshape DataFrame 并根据其他 24 列修改一列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45184824/

python - reshape DataFrame 并根据其他 24 列修改一列

上一篇：python - 使用行为标签仅执行此类标签的子案例

下一篇：python pandas 抛出解析错误