ruby - 从较小的短语重建原始句子?

标签 ruby string nlp

我有一个原句

sent = "For 15 years photographer Kari Greer has been documenting wildfires and the men and women who battle them."

和短语:

phrases = [
  "For 15 years",
  "wildfires and the men and women who battle them",
  "has been documenting wildfires",
  "been documenting wildfires and the men and women who battle them",
  "documenting wildfires and the men and women who battle them",
  "them",
  "and the men and women who battle them",
  "battle them",
  "wildfires",
  "the men and women",
  "the men and women who battle them",
  "15 years",
  "photographer Kari Greer"
]

我想从短语中重建原始句子(不丢失任何单词)并将选定的短语存储在新数组中以保持顺序,以便我得到:

 result = [
   "For 15 years",
   "photographer Kari Greer",
   "has been documenting wildfires",
   "and the men and women who battle them"
]

编辑:result 的元素数量最少很重要。

编辑:这是适用于更复杂情况的答案代码版本:

 sent ="Shes got six teeth Pink says of her 13-month-old daughter but shes not a biter"      
 phrases = ["her 13-month-old daughter", "she", "says of her 13-month-old daughter", "a biter", "got six teeth", "Pink", "of her 13-month-old daughter", "s not a biter", "She", "six teeth", "s got six teeth", "Shes got six"] 

def shortest(string, phrases)
 string = string.gsub(/\.|\n|\'|,|\?|!|:|;|'|"|`|\n|,|\?|!/, '')
 best_result = nil
 phrases.each do |phrase|
  if string.match(/#{phrase}/)
    result = [phrase] + shortest(string.sub(/#{phrase}/, "").strip, phrases)
        best_result = result  if (best_result.nil? || result.size < best_result.size) # && string == result.join(" ")
      end
    end
  best_result || []
end

最佳答案

def shortest(string, phrases)
  best_result = nil
  phrases.each do |phrase|
    if string.match(/\A#{phrase}/)
      result = [phrase] + shortest(string.sub(/\A#{phrase}/, "").strip, phrases)
      best_result = result if (best_result.nil? || result.size < best_result.size) && string.match(Regexp.new("\\A#{result.join("\\s?")}\\Z"))
    end
  end
  best_result || []
end
result = shortest(sent.gsub(/\./, ""), phrases)

编辑:更新了算法以允许某些短语之间没有空格。

关于ruby - 从较小的短语重建原始句子?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11300921/

相关文章:

ruby-on-rails - ruby /rails : extending or including other modules

java - 逆向Java完整文档

c - 该程序缺少/需要修复什么?

mysql 查询字符串变量

parsing - 检测/解析文本中的邮寄地址

python - 如何使用spacy查找句子是否包含名词?

ruby-on-rails - 如何让Sidekiq worker不加载Rails环境

sql - 从 ActiveRecord 获取排名

Ruby 无法在 mac os x 上启动

nlp - GATE 或 RapidMiner 哪个更好