r - 如何将包含 NULL 值的列表列表转换为数据框

我有一个从 JSON 创建的列表对象作为电子商务 API 的输出 - 最小。下面的例子。我正在尝试将其转换为 df 但运气不佳。

my_ls <- list(list(id = 406962L, user_id = 132786L, user_name = "Visitor Account", 
      organization_id = NULL, checkout_at = NULL, currency = "USD", 
      bulk_discount = NULL, coupon_codes = NULL, items = list(list(
        id = 505296L, quantity = 1L, unit_cost = 1295, used = 0L, 
        item_id = 6165L, item_type = "Path", item_name = "Product_2", 
        discount_type = "Percent", discount = NULL, coupon_id = NULL), 
        list(id = 505297L, quantity = 1L, unit_cost = 1295, used = 0L, 
             item_id = 6163L, item_type = "Path", item_name = "Product_1", 
             discount_type = "Percent", discount = NULL, coupon_id = NULL))), 
 list(id = 407178L, user_id = 132786L, user_name = "Visitor Account", 
      organization_id = "00001", checkout_at = NULL, currency = "USD", 
      bulk_discount = NULL, coupon_codes = NULL, items = list(
        list(id = 505744L, quantity = 1L, unit_cost = 1295, 
             used = 0L, item_id = 6163L, item_type = "Path", 
             item_name = "Product_1", 
             discount_type = "Percent", discount = NULL, coupon_id = NULL))))

我尝试了一些简短的解决方案，例如:
Converting a list of lists to a dataframe in R: The Tidyverse-way

... 和 flatten 的组合, map & map_dfr来自 purrr .

我一直遇到两个问题，当我解决一个问题时，我遇到了另一个问题:

有NULL某些条目的数据中的值。如果我尝试将子列表转换为 tibble，我会收到一个错误:Error: All columns in a tibble must be vectors. x Column组织 ID is NULL

下items子列表中有一个名为 id 的命名项.在更高级别的列表中已经有一个名为 id 的命名项目。 .前者代表产品ID，后者代表订单ID。我似乎无法可靠地重命名一个 - 通过一种转换为 df 的方法删除较低级别的 id。

items下的每一项sublist 是一个购物车项目，因此在最终的 df 中，它们应该包含来自更高级别列表项目的列项目，因此如果有两个子项目，将从更高级别列表继承的值将重复，例如 organization_id和 user_name .我想保留具有 NULL 的列值 - 一些条目，例如 checkout_at在更大的数据集中有值。

谢谢。

最佳答案

一种选择涉及 dplyr , tidyr和 purrr可能:

map_depth(.x = my_ls, 2, ~ replace(.x, is.null(.x), NA), .ragged = TRUE) %>%
 bind_rows() %>%
 mutate(items = map_depth(items, 2, ~ replace(.x, is.null(.x), NA))) %>%
 rename(`original_id` = id) %>%
 unnest_wider(items) 

 original_id user_id user_name organization_id checkout_at currency bulk_discount
        <int>   <int> <chr>     <chr>           <lgl>       <chr>    <lgl>        
1      406962  132786 Visitor … <NA>            NA          USD      NA           
2      406962  132786 Visitor … <NA>            NA          USD      NA           
3      407178  132786 Visitor … 00001           NA          USD      NA           
# … with 11 more variables: coupon_codes <lgl>, id <int>, quantity <int>, unit_cost <dbl>,
#   used <int>, item_id <int>, item_type <chr>, item_name <chr>, discount_type <chr>,
#   discount <lgl>, coupon_id <lgl>

或者使用 rrapply 的选项, dplyr和 tidyr :

rrapply(my_ls, f = function(x) if(is.null(x)) NA else x, how = "replace") %>%
 bind_rows() %>%
 rename(`original_id` = id) %>%
 unnest_wider(items)

关于r - 如何将包含 NULL 值的列表列表转换为数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62378058/

r - 如何将包含 NULL 值的列表列表转换为数据框

上一篇：python - 使用一个类同时作为装饰器和装饰器工厂进行操作

下一篇：scala - Spark scala - 在 df 中查找非零行