linux - 需要 shell/perl 脚本来屏蔽 Linux 上日志文件中的敏感信息,如名字、出生日期、ssn 等

标签 linux shell

我需要在日志文件中屏蔽敏感信息,如名字、姓氏、出生日期、ssn 等,但它们的出现没有特定的模式。需要在整个日志中找到如下字段,并用 xxxxx 屏蔽信息。请帮忙。

包含示例数据的日志 block :

ghix.log.2014-07-25: INFO 07/25/2014 17:13:14 (PlanDisplayRestClient.java:272) - Fetching  IndividualPlanList : {"inputData":{"household":{"CSR":"CS4","APTC":"187.0"},"issuerVerifiedFlag":true,"totalContribution":null,"enrollmentType":"I","exchangeType":"ON","showCatastrophicPlan":"false","issuerId":null,"groupId":3098,"eligLeadBenefits":",Nutritional counseling,Weight loss programs","subscriberData":null,"planIdStr":"","planType":"Both","tenant":"","providers":[{"id":"18900405","name":"Dr. Rakshit Kumar","networkId":["33602-TXN001","87006-TXN001","91986-TXN001","32600-TXN001","32600-TXN002","355678-FLN001"],"networkTier":"","spciality":"Counseling/Social Work","city":"Austin","state":"TX","providerType":"DOCTOR","networkTierList":null,"networkIdList":["33602-TXN001","87226-TXN001","91716-TXN001","3278-TXN001","32698-TXN002","3545-FLN001"]}],"planLevel":"","isSpecialEnrollment":"NO","pgrmType":"INDIVIDUAL","coverageStartDate":"01/01/2001","insuranceType":"HEALTH","preferences":{"highDrugUseVal":0.0,"lowDrugUseVal":0.0,"moderateMedicalVal":0.0,"highMedicalVal":0.0,"vHighDrugUseVal":0.0,"moderateDrugUseVal":0.0,"vHighMedicalVal":0.0,"lowMedicalVal":0.0}},"groupDataList":[{"groupId":3098,"aptc":187.0,"remainingAptc":0.0,"csr":"CS4","zipcode":"44444","countycode":"45555","personDataList":[{"personId":"1","externalPersonId":null,"existingMedicalEnrollmentID":null,"existingSADPEnrollmentID":null,"firstname":"Primary","lastname":"Tax Filer","dob":"1/1/2001","smoker":"N","dentalEligible":"NO","relationship":"Self","employerContribution":null,"gender":null},{"personId":"2","externalPersonId":null,"existingMedicalEnrollmentID":null,"existingSADPEnrollmentID":null,"firstname":"Primary","lastname":"Tax Filer","dob":"1/1/2001","smoker":"N","dentalEligible":"NO","relationship":"Child","employerContribution":null,"gender":null}]}],"pldHouseholdPersonList":null,"providersList":[{"id":"1000405","name":"Dr. Rakshit Kumar","networkId":["33002-TXN001","87000-TXN001","91000-TXN001","30003-TXN001","32003-TXN002","35000-FLN001"],"networkTier":"","spciality":"Counseling/Social Work","city":"Austin","state":"TX","providerType":"DOCTOR","networkTierList":null,"networkIdList":["33000-TXN001","87200-TXN001","90006-TXN001","30003-TXN001","30003-TXN002","35000-FLN001"]}],"eligLeadId":null,"ssapApplicationId":null,"consumerData":null}

要屏蔽的数据:

"dob":"1/1/2001"
"name":"Dr. Rakshit Kumar"
"smoker":"N"
"dentalEligible":"NO"

应该看起来像:

"dob":"xxxxx"
"name":"xxxxx"
"smoker":"x"
"dentalEligible":"x"

最佳答案

您的日志包含有效的 JSON 字符串。所以你需要这样做:

  1. 阅读每一行
  2. 从行中提取 JSON
  3. 读取 JSON 并将其解析为某些内部数据结构
  4. 更改所需字段
  5. 将更改后的数据结构导出为 JSON
  6. 将屏蔽日志写入新文件
  7. 完成
  8. 利润

编辑

也许它可以通过 bash 使用一些工具来完成,但我使用 Perl 语言来完成此类任务。尝试从零开始教 Perl 确实是题外话。

或者,尝试在 google 上搜索一些从 bash 操作 JSON 等内容。

要获取 JSON 部分,您可以使用类似下一个的内容

while read -r line
do
    part1=$(sed 's/\(.*IndividualPlanList : \).*/\1/' <<< "$line")
    json=$(sed 's/.*IndividualPlanList : //' <<< "$line")

    #do something with the JSON
    newjson=$(echo "$json")

    #write out the new line
    echo "$part1$newjson"

done < logfile.txt

关于linux - 需要 shell/perl 脚本来屏蔽 Linux 上日志文件中的敏感信息,如名字、出生日期、ssn 等,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25003481/

相关文章:

linux - Shell 脚本 - 小时和分钟计算

linux - 是否可以直接从C程序或shell脚本向搜索引擎发送搜索参数?

linux - 如何让 NReco.PdfGenerator.LT 使用 Linux 二进制文件?

linux - Perl获取UTC时间并获取UTC时间戳之间的差异

Bash 变量格式

javascript - 如何使 gulp 任务跨平台。特别是gulp-shell

c++ - Qt Thread 在 Linux 中进行 ping 操作

Linux 恢复删除的目录

c++ - 在大页面的 mmap 之后获取 SIGBUS,即使 HugePages_Free 为正

linux - 使用 grep 计算大量日志文件中某个标签的所有出现次数