Ruby - 如何提高数组扫描的性能?

标签 ruby arrays performance rspec mapreduce

部分基于ruby - how to generate the possible sequential combination of letters from an array of strings?我现在有一个匹配单词的程序,使用:

class Dictionary
  attr :words

  def words
    @words.map(&:upcase).uniq
  end 

  @@MAPPINGS= {A: 2, B: 2, C: 2, D: 3, E: 3, F: 3, G: 4, H: 4, I: 4, J: 5, K: 5, L: 5,
  M: 6, N: 6, O: 6, P: 7, Q: 7, R: 7, S: 7, T: 8, U: 8, V: 8, W: 9, X: 9, Y: 9, Z: 9}
  @@PHONE_NUMBER_LENGTH=10

  def initialize
    @words=[]
  end 

  def add_word(word)
    word.length < @@PHONE_NUMBER_LENGTH ? (@words << word) : nil 
  end 

  def load_system_dictionary(words_file='/usr/share/dict/american-english')
    File.open(words_file).each {|word| add_word(word)}
    true
  rescue Errno::ENOENT
    false
  end 

  def word_combinations(letters)
    possibles=[]
    letters.each_char do |one_letter|
      possibles << letter_mappings(one_letter)
    end 
    possibles.map(&:chars).map(&:to_a).reduce(&:product).map(&:join)
  end 

  def contains_word(word)
    @words.join.include?(word.upcase)
  end 

  def word_from_word_combinations(number_string)
    returns=[]
    word_combinations(number_string).each do |word|
      returns << word if @words.include?(word)
    end 
    returns
  end 

  private

  def letter_mappings(letter)
    @@MAPPINGS.select{ |key,val| val==letter.to_i }.keys.join
  end 

end

在不到十分之一秒的时间内就可以很好地处理短的和中等长度的单词,例如对于动物。然而,对于较长的单词,例如 MUMMIFICATION,即

it "should see that the valid words for 6866434228466 is 'MUMMIFICATION'" do
  expect(dictionary.word_from_word_combinations('6866434228466')).to match_array(['MUMMIFICATION'])
end

测试需要 30 秒。

我尝试在每个阶段添加.uniq

possibles.map(&:chars).map(&:to_a).reduce(&:product).map(&:join)

而且我还切换到将 ruby​​ 2.0 作为我的默认值,但这只是增加了 6 秒的运行时间:(

我已经改用 sawa 的方法了:

first, *rest = possibles.map{|s| s.each_char.to_a}
first.product(*rest).map(&:join)

def word_combinations(letters)
  possibles=[]
  letters.each_char do |one_letter|
    possibles << letter_mappings(one_letter)
  end 
  first, *rest = possibles.map{|s| s.each_char.to_a}
  first.product(*rest).map(&:join) 
end

这很有帮助,将它减少到 15 秒,

来自 Marshal 的 .map(&:chars),即

def word_combinations(letters)
  possibles=[]
  letters.each_char do |one_letter|
    possibles << letter_mappings(one_letter)
  end 
  first, *rest = possibles.map(&:chars)
  first.product(*rest).map(&:join) 
end

很有趣,但没有提高性能。

还有什么我可以做的吗?

最佳答案

words = File.read("/usr/share/dict/american-english")
  .split.map{|w| w.chomp.upcase}
mapping = {A: 2, B: 2, C: 2, D: 3, E: 3, F: 3, G: 4, H: 4, I: 4, J: 5, K: 5,
  L: 5, M: 6, N: 6, O: 6, P: 7, Q: 7, R: 7, S: 7, T: 8, U: 8, V: 8, W: 9, X: 9,
  Y: 9, Z: 9}
better_mapping = mapping.map{|k, v| [k.to_s, v]}.to_h

t = Time.now
p words.select{|w| w.chars.map{|c| better_mapping[c]}.join == "6866434228466"}
puts Time.now - t

结果:

["MUMMIFICATION"]
0.847988125

mapping = {A: 2, B: 2, C: 2, D: 3, E: 3, F: 3, G: 4, H: 4, I: 4, J: 5, K: 5,
  L: 5, M: 6, N: 6, O: 6, P: 7, Q: 7, R: 7, S: 7, T: 8, U: 8, V: 8, W: 9, X: 9,
  Y: 9, Z: 9}
better_mapping = mapping.map{|k, v| [k.to_s, v]}.to_h
words = File.read("/usr/share/dict/american-english")
  .split.map{|w|
    w = w.chomp.upcase
    [w, w.chars.map{|c| better_mapping[c]}.join]
  }.group_by(&:last)
.map{|k, a| [k, a.map(&:first)]}.to_h

t = Time.now
p words["6866434228466"]
puts Time.now - t

结果:

["MUMMIFICATION"]
8.5981e-05

关于Ruby - 如何提高数组扫描的性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22432751/

相关文章:

ruby-on-rails - Aws::S3::Presigner undefined method credentials for nil:NilClass in Ruby

ruby - 未初始化的常量记录器 (NameError)

performance - 通过大量的幂函数调用优化 haskell 函数

javascript - 在 IE 6 或 FF 3.x 上测量页面渲染时间

由于大小超过 6000 的数组列表,Java 性能下降

ruby-on-rails - 如何在 Windows 上安装 Nokogiri 1.6.7.2

ruby-on-rails - 在类方法中获取属性的值

c++ - 如何在C++中按升序对结构数组进行排序

arrays - 我将一个数组传递给 'xlsx' 以获取一个 Excel 文件,但我得到了在每一行中解析的数组元素

JavaScript 数组声明方式的区别