ruby - 查找连续的子串索引

给定一个搜索字符串和一个结果字符串(保证包含搜索字符串的所有字母，不区分大小写，按顺序排列)，我怎样才能最有效地得到一个范围数组，代表结果字符串中对应的索引到搜索字符串中的字母？

期望的输出:

substrings( "word", "Microsoft Office Word 2007" )
#=> [ 17..20 ]

substrings( "word", "Network Setup Wizard" )
#=> [ 3..5, 19..19 ]
#=> [ 3..4, 18..19 ]   # Alternative, acceptable, less-desirable output

substrings( "word", "Watch Network Daemon" )
#=> [ 0..0, 10..11, 14..14 ]

这是一个自动完成搜索框。这是来自 a tool 的屏幕截图类似于 Quicksilver就像我想做的那样在字母下划线。请注意——与我上面的理想输出不同——此屏幕截图不喜欢较长的单个匹配项。
Screenshot of Colibri underlining letters in search results

基准测试结果

对当前工作结果进行基准测试表明，@tokland 基于正则表达式的答案基本上与我提出的基于 StringScanner 的解决方案一样快，而且代码更少:

               user     system      total        real
phrogz1    0.889000   0.062000   0.951000 (  0.944000)
phrogz2    0.920000   0.047000   0.967000 (  0.977000)
tokland    1.030000   0.000000   1.030000 (  1.035000)

这是基准测试:

a=["Microsoft Office Word 2007","Network Setup Wizard","Watch Network Daemon"]
b=["FooBar","Foo Bar","For the Love of Big Cars"]
test = { a=>%w[ w wo wor word ], b=>%w[ f fo foo foobar fb fbr ] }
require 'benchmark'
Benchmark.bmbm do |x|
  %w[ phrogz1 phrogz2 tokland ].each{ |method|
    x.report(method){ test.each{ |words,terms|
      words.each{ |master| terms.each{ |term|
        2000.times{ send(method,term,master) }
      } }
    } }
  }
end

最佳答案

要有一些开始，怎么样？

>> s = "word"
>> re = /#{s.chars.map{|c| "(#{c})" }.join(".*?")}/i # /(w).*?(o).*?(r).*?(d)/i/
>> match = "Watch Network Daemon".match(re)
=> #<MatchData "Watch Network D" 1:"W" 2:"o" 3:"r" 4:"D">
>> 1.upto(s.length).map { |idx| match.begin(idx) }
=> [0, 10, 11, 14]

现在你只需要 build the ranges (如果你真的需要它们，我想单独的索引也可以)。

关于ruby - 查找连续的子串索引，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5718761/

ruby - 查找连续的子串索引

基准测试结果

上一篇：Ruby 视频教程

下一篇：ruby-on-rails - Rails Way 处理多条路线上可用的 Action