我正在解析通话记录。转录内容以字符串形式返回,格式如下:
"Operator: Hi, please welcome Bob Smith to the call. Bob Smith: Hello there, thank you for inviting me...Now I will turn the call over to Stacy. Stacy White: Thanks Bob. As he was saying...."
每个新发言者开始发言时不会有新的台词。
我想将上面的字符串转换为哈希数组。类似于以下内容:
[ { speaker: "Operator",
content: "Hi, please welcome Bob Smith to the call" },
{ speaker: "Bob Smith",
content: "Hello there, thank you for inviting me...Now I will turn the call over to Stacy." },
{ speaker: "Stacy White",
content: "Thanks Bob. As he was saying...." }
]
我想我需要使用某种正则表达式来解析它,但即使花了一上午的时间阅读它,也不知道如何解析它。如有任何帮助,我们将不胜感激。
谢谢
更新:
对于可能觉得这有用的其他人,这是我最终使用下面建议的解决方案得出的结果:
def display_transcript
transcript_pretty = []
transcript = self.content
transcript_split = transcript.split(/\W*([A-Z]\w*\W*\w+):\W*/)[1..-1]
transcript_split_2d = transcript_split.each_slice(2).to_a
transcript_split_2d.each do |row|
blurb = { speaker: row[0], content: row[1]}
transcript_pretty << blurb
end
return transcript_pretty
end
最佳答案
我可以给你一个可以用来分解字符串的表达式。 从那里你可以自己承担责任,我相信你不会希望我夺走你实现目标的乐趣吧? :>)
string = "Operator: Hi, please welcome Bob Smith to the call. Bob Smith: Hello there, thank you for inviting me...Now I will turn the call over to Stacy. Stacy White: Thanks Bob. As he was saying...."
split_up = string.split(/\W*(\w*\W*\w+):\W*/)[1..-1]
Hash[*split_up]
# {"Operator"=>"Hi, please welcome Bob Smith to the call", "Bob Smith"=>"Hello there, thank you for inviting me...Now I will turn the call over to Stacy", "Stacy White"=>"Thanks Bob. As he was saying...."}
一些解释:正则表达式查找一两个单词(\w*\W*\w+)
,最后在前面加上一个点和一个空格\W*
后面是一个双点,最后是空格,后面是 :\W*
该表达式用于分割数组中的字符串。
结果始终以空字符串开头,因此您可以通过 [1..-1]
删除它
接下来,将该数组转换为哈希,第一个元素是键,第二个元素是值,依此类推,直到数组末尾。
关于ruby-on-rails - 将调用记录解析为哈希数组 - Ruby,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69078794/