python - Pandas :从存储为列值的列表中创建新列

标签 python pandas dataframe

我有一个 Dataframe,其列值如下所示:

[
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:48 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "A"
    },
    {
      "Event" : "B",
      "TimeStamp" : "2019-09-24 10:17:38 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "OrderID" : "0"
    },
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:35 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "D"
    },
    {
      "Event" : "V",
      "TimeStamp" : "2019-09-24 10:17:33 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "OrderID" : "0"
    },
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:32 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "C"
    }
  ]

我想制作所有键的列。 因此,原始数据框如下所示:


+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
|    | O          | v           | S       |               I                       |                     EventLog                       | CustomerID  |  a   |  b   |  c   |  d   |  e   |  f  |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
| 0  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 1  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 2  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 3  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 4  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         15  | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+

我正在寻找这样的东西


+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
|    | O          | v           | S       |               I                       |                     EventLog                       | CustomerID  |OrdeID|  TimeStamp                 |Screen        | StarsVar   |Event |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
| 0  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | A    |
| 1  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | B    |
| 2  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | C    |
| 3  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | D    |
| 4  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | E    |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+

不一定需要删除上面输出中所示的列。

最佳答案

首先通过构造函数创建DataFrame:

df1 = pd.DataFrame(df['EventLog'].values.tolist())
print (df1)
  OrderID                  TimeStamp       Screen StateVars Event
0       0  2019-09-24 10:17:48 +0000  Home_Screen               A
1       0  2019-09-24 10:17:38 +0000  Home_Screen               B
2       0  2019-09-24 10:17:35 +0000  Home_Screen               D
3       0  2019-09-24 10:17:33 +0000  Home_Screen               V
4       0  2019-09-24 10:17:32 +0000  Home_Screen               C

并添加到原始:

df = df.join(df1)
print (df)

编辑:我认为有一些缺失值,所以解决方案是将它们替换为空字典 - 最后它创建缺失值:

print (df)
                                            EventLog
0  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
1  {'Event': 'B', 'TimeStamp': '2019-09-24 10:17:...
2  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
3  {'Event': 'V', 'TimeStamp': '2019-09-24 10:17:...
4  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
5                                                NaN

df = pd.DataFrame([x if x ==x else {} for x in df['EventLog']])
print (df)
  OrderID                  TimeStamp       Screen StateVars Event
0       0  2019-09-24 10:17:48 +0000  Home_Screen               A
1       0  2019-09-24 10:17:38 +0000  Home_Screen               B
2       0  2019-09-24 10:17:35 +0000  Home_Screen               D
3       0  2019-09-24 10:17:33 +0000  Home_Screen               V
4       0  2019-09-24 10:17:32 +0000  Home_Screen               C
5     NaN                        NaN          NaN       NaN   NaN

另一个解决方案:

a=df['EventLog'].values.tolist()
a = [x for x in a if x == x]
empty_df=pd.DataFrame()
for i in range(0, len(a)):
    b=a[i]
    for j in range(0, len(b)):
        c=b[j]
        empty_df=empty_df.append(c, ignore_index=True, sort=False)
df = df.join(empty_df)

关于python - Pandas :从存储为列值的列表中创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58431864/

相关文章:

python - 从html中提取数组元素

python - 如何在 Pandas Dataframe 中的 groupby 之后使用 assign 函数

python - 您如何使用 pd.read_clipboard 读取带有列表的数据框?

python-3.x - Dataframe 融化并保留索引

python - 在文件中搜索单词后读取文件的剩余行

python - mod_python Apache 配置

python - Pandas 跨列删除重复项

rpy2 不会转换回 pandas

python - 类的前向声明?

python - 从 pandas 数据框中获取单词列表的计数,其中每列都是单词列表