python - 使用字典交叉数据框列 "contains List"

标签 python pandas dataframe lambda pivot-table

我正在尝试创建数据框。来自数据框与交叉字典的组合,如下所示

enter image description here

dataframe 包含多列“N 列 x,y,z,a,b,c ......等 100 多列”

df = pd.DataFrame({'ID':['EF407412','KM043272']
               , 'x': ['[2788, 3140, 4836]','[539, 906, 1494, 1932, 2029,7001]']
               , 'y': ['[1408, 1572, 2277]','[]']
               # dataframe contains multiple columns "N numbers of columns x,y,z,a,b,c ......etc more than 100 columns " 
               })

字典名称是scale,它的项目(键到值)是可定制的,并且在下面的评论中提到了从输入数据帧到输出数据帧的转换规则

scale = ("500-10000", {

# Key= Scales and value = Weights, both Customizable

    500: 7000,   #  key is 500 and value is compared with List as  List items >= 7000
2500: 3000,  #  key is 2500 and value is compared with List as  List 7000 > items >= 3000
5000: 1000,  #  key is 5000 and value is compared with List as  List 3000 > items >= 1000
7500: 400,   #  key is 7500 and value is compared with List as  List 1000 > items >= 400
10000:250    #  key is 10000 and value is compared with List as  List 400 > items >= 250
             #  any others List items < 250 will be neglected 
             #  any others List items < 250 will be neglected }) 

重要 p.s >>> 如果输入列表项包含冗余数据,它将在输出中被视为单独的值。例如 x 列包含列表 [4836, 4836, 4836] x_2500 列内的输出将是 [4836, 4836, 4836]

最佳答案

使用您的 dfscale 对象...

def make_new_columns(series: pd.Series) -> pd.DataFrame:
    """Given column, make new columns using `scale`."""
    # convert str representation of list to literal list
    series = series.apply(ast.literal_eval)
    
    scale_dict = scale[1]
    
    frames = []
    for k, v in scale_dict.items():
        k_frame = pd.DataFrame({f"{series.name}_{k}": series.apply(lambda x: [i for i in x if i >= v])})
        frames.append(k_frame)
        
    frame = pd.concat(frames, axis="columns")
    
    cols = frame.columns[frame.columns.str.startswith(f"{series.name}_")]
    
    for col0, col1 in zip(cols, cols[1:]):
        frame[f"{col1}_"] = frame[[col0, col1]].applymap(set).apply(lambda x: x[col1].difference(x[col0]), axis=1)
    
    # the first `x_...` col is `x_500` and will not change -- remove others
    frame = frame.drop(columns=cols[1:])
    
    frame.columns = frame.columns.str.strip("_")
    
    frame[cols] = frame[cols].applymap(lambda x: [0] if not len(x) else list(x))
    
    return frame

# apply `make_new_columns` to x, y, z, a, b, c, ...
cols_to_apply = df.loc[:, "x":].columns

to_join = []
for col in cols_to_apply:
    new = make_new_columns(df[col])
    to_join.append(new)

df = df[["ID"]].join(to_join)

df

output

关于python - 使用字典交叉数据框列 "contains List",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73407424/

相关文章:

python astype(str)提供SettingWithCopyWarning并请求我使用loc

python - Pandas 搜索字符串的特定列

python - 将 os.popen 命令转换为 subprocess.Popen 实例

python - Flask 虚拟 Web 应用程序示例

Python urllib2.urlopen(url).read() 与 Firefox 中看到的源代码不同

r - 当要填充的条目数可能不同时,如何仅填充一行中的特定值

python - Pandas - 将分组和计数应用于多列以生成/更改数据框

python - 更改通过 pybind11_add_module 创建的库的输出目录

python - 如何在某些条件下创建 DataFrame 输出?

python - 如何为我的数据集创建多线图?