Ruby:字符串重建算法,只能部分工作

标签 ruby string algorithm

我正在研究 Ruby 中的字符串重构算法(动态编程示例中的经典算法,将空格较少的文本转换为正常空格的文本)。 下面的代码是纯 ruby​​,你可以复制粘贴并立即开始测试,它 80% 的时间都在工作,而且往往会崩溃,字典越大。我用超过 80k 词的词典测试了它,但它的效果不太好,大约 70% 的时间。

如果词典中存在该词,如果有办法让它 100% 工作,请告诉我。

这是代码:(它的间距很好,应该非常易读)

# Partially working string reconstruction algo in pure Ruby

# the dictionary
def dict(someWord)
  myArray = [" ", "best", "domain", "my", "successes", "image", "resizer", "high", "tech", "crime", "unit", "name", "edge", "times", "find", "a", "bargain", "free", "spirited", "style", "i", "command", "go", "direct", "to", "harness", "the", "force"]
  return !!(myArray.index(someWord))
end


# inspired by http://cseweb.ucsd.edu/classes/wi12/cse202-a/lecture6-final.pdf

## Please uncomment the one you wanna use
#
# (all the words used are present in the dictionary above)
#
# working sentences
  x = ' ' + "harnesstheforce"
# x = ' ' + "hightechcrimeunit"
#
# non working sentences
# x = ' ' + "findabargain"
# x = ' ' + "icommand"

puts "Trying to reconstruct #{x}"

# useful variables we're going to use in our algo
n = x.length
k = Array.new(n)
s = Array.new(n)
breakpoints = Hash.new
validBreakpoints = Hash.new

begin

  # let's fill k
  for i in 0..n-1
    k[i] = i
  end

  # the core algo starts here
  s[0] = true
  for k in 1..n-1

    s[k] = false

    for j in 1..k
      if s[j-1] && dict(x[j..k])
        s[k] = true

        # using a hash is just a trick to not have duplicates
        breakpoints.store(k, true)
      end
    end

  end

  # debug
  puts "breakpoints: #{breakpoints.inspect} for #{x}"

  # let's create a valid break point vector

  i=1
  while i <= n-1 do

    # we choose the longest valid word
    breakpoints.keys.sort.each do |k|

      if i >= k
        next
      end

      # debug: when the algo breaks, it does so here and goes into an infinite loop
      #puts "x[#{i}..#{k}]: #{x[i..k]}"
      if dict(x[i..k])
        validBreakpoints[i] = k
      end
    end

    if validBreakpoints[i]
      i = validBreakpoints[i] + 1
    end

  end

  # debug
  puts "validBreakpoints: #{validBreakpoints.inspect} for #{x}"

  # we insert the spaces at the places defined by the valid breakpoints
  x = x.strip
  i = 0
  validBreakpoints.each_key do |key|
    validBreakpoints[key] = validBreakpoints[key] + i
    i += 1
  end

  validBreakpoints.each_value do |value|
    x.insert(value, ' ')
  end

  puts "Debug: x: #{x}"


  # we capture ctrl-c
rescue SignalException
  abort

# end of rescue
end

最佳答案

请注意,对于包含单字符单词的字符串,您的算法会失败。这是一个差一错误。您忽略了这些词之后的断点,因此您最终得到了一个不包含在您的字典中的词 ("abargain")。

改变

   if i >= k
     next
   end

   if i > k
     next
   end

或更像 Ruby

   next if i > k

另请注意,只要您的字符串包含不是单词的内容,您就会陷入无限循环:

if validBreakpoints[i]        # will be false 
  i = validBreakpoints[i] + 1 # i not incremented, so start over at the same position
end

你最好把它当作一个错误

return '<no parse>' unless validBreakpoints[i] # or throw if you are not in a function
i = validBreakpoints[i] + 1

"inotifier" 的问题是您的算法存在缺陷。总是选择最长的词是不好的。在这种情况下,检测到的第一个“有效”断点是在 “in” 之后,这给您留下了非单词 “otifier”

关于Ruby:字符串重建算法,只能部分工作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17879340/

相关文章:

java - 字符串比较 - Android

algorithm - Batcher Merge 如何在高层工作?

ruby-on-rails - Rails 命名空间模型给出未初始化的常量错误

ruby-on-rails - 为什么在 ruby​​ on rails 的 Controller 类中的 new 和 create 函数中创建了两个模型对象,而不是只使用一个?

ruby - 比较基准时,Go 还是 Ruby 更慢?

java - 按对象属性对对象的 ArrayList 进行排序

java - Java 中的 Hangman 抛出 ArrayIndexOutOfBounds

java - Bubble Shooter 游戏中的 Flood-Fill 算法

寻找子集组合以实现给定总和同时保持成本最低的算法

ruby - 调用 View 文件时如何传递参数?