python - 使用两个现有列创建和填充 Pandas 数据框列

标签 python pandas

我的数据框有 4 列,如下所示。

我有什么:

ID  start_date  end_date    active
1,111   6/30/2015   8/6/1904    1 to 10
1,111   6/28/2016   3/30/1905   1 to 10
1,111   7/31/2017   6/6/1905    1 to 10
1,111   7/31/2018   6/6/1905    1 to 9
1,111   5/31/2019   12/4/1904   1 to 9
3,033   3/31/2015   5/18/1908   3 to 7
3,033   3/31/2016   11/24/1905  3 to 7
3,033   3/31/2017   1/20/1906   3 to 7
3,033   3/31/2018   1/8/1906    2 to 7
3,033   4/4/2019    2200,0  2 to 8

我想根据“事件”列的值再生成 10 个列,如下所示。有没有办法有效地填充它。

我想要实现的目标

ID  start_date  end_date    active  Type 1  Type 2  Type 3  Type 4  Type 5  Type 6  Type 7  Type 8  Type 9  Type 10
1,111   6/30/2015   8/6/1904    1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   6/28/2016   3/30/1905   1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   7/31/2017   6/6/1905    1 to 10 1   1   1   1   1   1   1   1   1   1
1,111   7/31/2018   6/6/1905    1 to 9  1   1   1   1   1   1   1   1   1   
1,111   5/31/2019   12/4/1904   1 to 9  1   1   1   1   1   1   1   1   1   
3,033   3/31/2015   5/18/1908   3 to 7          1   1   1   1   1           
3,033   3/31/2016   11/24/1905  3 to 7          1   1   1   1   1           
3,033   3/31/2017   1/20/1906   3 to 7          1   1   1   1   1           
3,033   3/31/2018   1/8/1906    2 to 7      1   1   1   1   1   1           
3,033   4/4/2019    2200,0  2 to 8      1   1   1   1   1   1   1       

最佳答案

通过np.arange 使用自定义函数:

def f(x):
    a = list(map(int, x.split(' to ')))
    return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type '))
print (df)
      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10     1.0     1.0     1.0     1.0   
1  1,111  6/28/2016   3/30/1905  1 to 10     1.0     1.0     1.0     1.0   
2  1,111  7/31/2017    6/6/1905  1 to 10     1.0     1.0     1.0     1.0   
3  1,111  7/31/2018    6/6/1905   1 to 9     1.0     1.0     1.0     1.0   
4  1,111  5/31/2019   12/4/1904   1 to 9     1.0     1.0     1.0     1.0   
5  3,033  3/31/2015   5/18/1908   3 to 7     NaN     NaN     1.0     1.0   
6  3,033  3/31/2016  11/24/1905   3 to 7     NaN     NaN     1.0     1.0   
7  3,033  3/31/2017   1/20/1906   3 to 7     NaN     NaN     1.0     1.0   
8  3,033  3/31/2018    1/8/1906   2 to 7     NaN     1.0     1.0     1.0   
9  3,033   4/4/2019      2200,0   2 to 8     NaN     1.0     1.0     1.0   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0     1.0     1.0     1.0     1.0     1.0      1.0  
1     1.0     1.0     1.0     1.0     1.0      1.0  
2     1.0     1.0     1.0     1.0     1.0      1.0  
3     1.0     1.0     1.0     1.0     1.0      NaN  
4     1.0     1.0     1.0     1.0     1.0      NaN  
5     1.0     1.0     1.0     NaN     NaN      NaN  
6     1.0     1.0     1.0     NaN     NaN      NaN  
7     1.0     1.0     1.0     NaN     NaN      NaN  
8     1.0     1.0     1.0     NaN     NaN      NaN  
9     1.0     1.0     1.0     1.0     NaN      NaN   

类似的:

def f(x):
    a = list(map(int, x.split(' to ')))
    return pd.Series(1, index= np.arange(a[0], a[1] + 1))

df = df.join(df['active'].apply(f).add_prefix('Type ').fillna(0).astype(int))
print (df)
      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10       1       1       1       1   
1  1,111  6/28/2016   3/30/1905  1 to 10       1       1       1       1   
2  1,111  7/31/2017    6/6/1905  1 to 10       1       1       1       1   
3  1,111  7/31/2018    6/6/1905   1 to 9       1       1       1       1   
4  1,111  5/31/2019   12/4/1904   1 to 9       1       1       1       1   
5  3,033  3/31/2015   5/18/1908   3 to 7       0       0       1       1   
6  3,033  3/31/2016  11/24/1905   3 to 7       0       0       1       1   
7  3,033  3/31/2017   1/20/1906   3 to 7       0       0       1       1   
8  3,033  3/31/2018    1/8/1906   2 to 7       0       1       1       1   
9  3,033   4/4/2019      2200,0   2 to 8       0       1       1       1   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0       1       1       1       1       1        1  
1       1       1       1       1       1        1  
2       1       1       1       1       1        1  
3       1       1       1       1       1        0  
4       1       1       1       1       1        0  
5       1       1       1       0       0        0  
6       1       1       1       0       0        0  
7       1       1       1       0       0        0  
8       1       1       1       0       0        0  
9       1       1       1       1       0        0  

另一个非循环解决方案 - 想法是删除重复项,使用 get_dummies 创建新行, reindex用于添加缺失的列,最后添加 1 倍数 cumsum编辑值:

df1 = (df.set_index('active', drop=False)
        .pop('active')
        .drop_duplicates()
        .str.get_dummies(' to '))

df1.columns = df1.columns.astype(int)
df1 = df1.reindex(columns=np.arange(df1.columns.min(),df1.columns.max() + 1), fill_value=0)
df1 = (df1.cumsum(axis=1) * df1.iloc[:, ::-1].cumsum(axis=1)).clip_upper(1)
print (df1)
         1   2   3   4   5   6   7   8   9   10
active                                         
1 to 10   1   1   1   1   1   1   1   1   1   1
1 to 9    1   1   1   1   1   1   1   1   1   0
3 to 7    0   0   1   1   1   1   1   0   0   0
2 to 7    0   1   1   1   1   1   1   0   0   0
2 to 8    0   1   1   1   1   1   1   1   0   0

df = df.join(df1.add_prefix('Type '), on='active')
print (df)

      ID start_date    end_date   active  Type 1  Type 2  Type 3  Type 4  \
0  1,111  6/30/2015    8/6/1904  1 to 10       1       1       1       1   
1  1,111  6/28/2016   3/30/1905  1 to 10       1       1       1       1   
2  1,111  7/31/2017    6/6/1905  1 to 10       1       1       1       1   
3  1,111  7/31/2018    6/6/1905   1 to 9       1       1       1       1   
4  1,111  5/31/2019   12/4/1904   1 to 9       1       1       1       1   
5  3,033  3/31/2015   5/18/1908   3 to 7       0       0       1       1   
6  3,033  3/31/2016  11/24/1905   3 to 7       0       0       1       1   
7  3,033  3/31/2017   1/20/1906   3 to 7       0       0       1       1   
8  3,033  3/31/2018    1/8/1906   2 to 7       0       1       1       1   
9  3,033   4/4/2019      2200,0   2 to 8       0       1       1       1   

   Type 5  Type 6  Type 7  Type 8  Type 9  Type 10  
0       1       1       1       1       1        1  
1       1       1       1       1       1        1  
2       1       1       1       1       1        1  
3       1       1       1       1       1        0  
4       1       1       1       1       1        0  
5       1       1       1       0       0        0  
6       1       1       1       0       0        0  
7       1       1       1       0       0        0  
8       1       1       1       0       0        0  
9       1       1       1       1       0        0  

关于python - 使用两个现有列创建和填充 Pandas 数据框列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52089554/

相关文章:

Python:使用 "..%(var)s.."% locals() 是一个好习惯吗?

python - RESTful Python WSGI 网络框架

python - 如何删除 pandas Dataframe 中的 "mirror copy"行?

python - 读取以数字开头的 CSV 列名称(Python)

python - Pandas - 动态生成列中的值并在下一行中实时使用它们

python - 使用查找表在 Pyspark 中平均向量

python - 有没有办法为 Python 对象的所有成员打印简短版本的文档字符串?

python - pandas, Python 中 "Jul 07, 2019"的日期解析代码

python-3.x - 在具有相同 ID 的行末尾添加 Pandas 值

python - 从两个 DF 中删除具有不常见列值的行