json - 使用 jq 和 awk 拆分大型 JSON 文件

标签 json powershell awk jq

我有一个大文件叫

Metadata_01.json

它由遵循以下结构的 block 组成:

[
 {
  "Participant_id": "P04_00001",
  "no_of_people": "Multiple",
  "apparent_gender": "F",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Side View",
  "indoor_outdoor_env": "Indoors",
  "lighting_condition": "Bright",
  "Occluded": 1,
  "category": "Two Person",
  "camera_movement": "Still",
  "action": "No action",
  "indoor_outdoor_in_moving_car_or_train": "Indoor",
  "daytime_nighttime": "Nighttime"
 },
 {
  "Participant_id": "P04_00002",
  "no_of_people": "Single",
  "apparent_gender": "M",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Frontal View",
  "indoor_outdoor_env": "Outdoors",
  "lighting_condition": "Bright",
  "Occluded": "None",
  "category": "Animals",
  "camera_movement": "Still",
  "action": "Small action",
  "indoor_outdoor_in_moving_car_or_train": "Outdoor",
  "daytime_nighttime": "Daytime"
 },

依此类推……成千上万。

我正在使用以下命令:

jq -cr '.[]' Metadata_01.json | awk '{print > (NR ".json")}'

它正在做预期的工作。

From large file that is structured like this

I am getting tons of files that named like this

And structure like this (in one line)

我需要以“Participant_id”命名每个 json 文件而不是这些结果(例如 P04_00002.json) 我想保留 json 结构,使其看起来像每个文件的结构

{
  "Participant_id": "P04_00002",
  "no_of_people": "Single",
  "apparent_gender": "M",
  "geographic_location": "AUS",
  "ethnicity": "Caucasian",
  "capture_device_used": "iOS 14",
  "camera_orientation": "Portrait",
  "camera_position": "Frontal View",
  "indoor_outdoor_env": "Outdoors",
  "lighting_condition": "Bright",
  "Occluded": "None",
  "category": "Animals",
  "camera_movement": "Still",
  "action": "Small action",
  "indoor_outdoor_in_moving_car_or_train": "Outdoor",
  "daytime_nighttime": "Daytime"
 }

为了达到这个目的,我应该对上面的命令做哪些调整? 或者也许有更简单的方法来做到这一点?谢谢!

最佳答案

What adjustments should I make ...?

我会选择:

jq -cr '.[] | (.Participant_id, .)' Metadata_01.json | awk '
  NR%2==1 {fn="id." $0 ".json"; next} {print >> fn; close(fn); }
'

然后运行类似 jq 的东西。 "$文件"|海绵 "$FILE" 漂亮地打印每个文件。

或者,如果您可以解决转义引号时可能出现的任何问题,您可以让 awk 调用 jq:

jq -cr '.[] | (.Participant_id, .)' Metadata_01.json | awk -v q=$'\'' '
  NR%2==1 {fn = "id." $0 ".json"; next}
  {  system( ("jq . <<< " q $0 q " >> \"" fn "\"") );
     close(fn);
  }
'

“大数据”

当然,如果输入文件对于 jq empty 来说太大或太慢,那么您将需要考虑替代方案,例如jq 的 --stream 选项,jstream,或者我自己的 jm .例如,如果您希望 JSON 在每个文件中打印得很好:

while read -r json
do
   fn=$(jq -r .Participant_id <<< "$json")
   <<< "$json" jq . > "id.$fn.json"
done < <(jm Metadata_01.json)

关于json - 使用 jq 和 awk 拆分大型 JSON 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74551827/

相关文章:

json - 我如何嵌套嵌套(Elasticsearch映射)

powershell - 使用PowerShell从资源管理器的收藏夹栏中添加和删除快捷方式

arrays - PowerShell 中有函数指针或函数数组吗?

perl - 查找两个文件之间第一列的匹配模式,并在第三个文件中打印每个文件中的行

java - GSON将json值反序列化为Java对象

javascript - 添加新属性 :value pair to existing JSON object

json - "Trying to get property of non-object"在 Symfony 4.1 上执行 JSON 请求测试时

powershell - Set-SBCertificate 未找到有效证书

awk - 将字符串替换为 sed 或 awk,其中它标识一个模式,如下所述

linux - 当模式包含\n 时查找并替换 - linux 命令行