我有类似的哈希表(但更大):
h={
"Wilhelm_Conrad_Röntgen": "http://www.example.com/w2xtt4w/001_1901.jpg",
"Hendrik_Lorentz": "http://www.example.com/apbfksz/004_1902.jpg",
"Pieter_Zeeman": "http://www.example.com/d2cpwj3/007_1902.jpg",
"Antoine_Henri_Becquerel": "http://www.example.com/g8sueyg/010_1903.jpg",
"Maria_Skłodowska-Curie": "http://www.example.com/gfcgur8/013_1903.jpg",
"Lord_Rayleigh": "http://www.example.com/dcjiwq8/016_1904.jpg",
"Joseph_John_Thomson": "http://www.example.com/4a66bc9/019_1906.jpg",
"Gabriel_Lippmann": "http://www.example.com/xjefgoa/022_1908.jpg",
"Guglielmo_Marconi": "http://www.example.com/x359w62/025_1909.jpg",
"Karl_Ferdinand_Braun": "http://www.example.com/1edm469/028_1909.jpg",
"Johannes_Diderik_van_der_Waals": "http://www.example.com/31hpue7/031_1910.jpg",
"Wilhelm_Wien": "http://www.example.com/yel9iff/034_1911.jpg",
"Nils_Gustaf_Dalén": "http://www.example.com/iezss59/037_1912.jpg",
"Heike_Kamerlingh-Onnes": "http://www.example.com/8zl4uj2/040_1913.jpg",
"Max_von_Laue": "http://www.example.com/ta3w6rn/043_1914.jpg",
"William_Henry_Bragg": "http://www.example.com/u43qn5h/046_1915.jpg",
"William_Lawrence_Bragg": "http://www.example.com/n7gkk6e/049_1915.jpg",
"Charles_Glover_Barkla": "http://www.example.com/2kxxroc/052_1917.jpg",
"Max_Planck": "http://www.example.com/eyw7tm6/055_1918.jpg",
"Johannes_Stark": "http://www.example.com/gcjcv2p/058_1919.jpg",
"Charles_Édouard_Guillaume": "http://www.example.com/nox3s6o/061_1920.jpg",
"Niels_Bohr": "http://www.example.com/mo9ga29/064_1922.jpg",
"Robert_Andrews_Millikan": "http://www.example.com/kxq71if/067_1923.jpg",
"Manne_Siegbahn": "http://www.example.com/2uwhw9y/070_1924.jpg",
"James_Franck": "http://www.example.com/iwdavip/073_1925.jpg",
"Gustav_Hertz": "http://www.example.com/73mbso2/076_1925.jpg",
"Jean_Baptiste_Perrin": "http://www.example.com/rgugmas/079_1926.jpg",
"Arthur_Holly_Compton": "http://www.example.com/vy7is1v/082_1927.jpg",
"Owen_Willans_Richardson": "http://www.example.com/3jz5ve8/085_1928.jpg",
"Louis_Victor_Pierre_Raymond": "http://www.example.com/srvj8d5/088_1929.jpg",
"John_M_Kosterlitz": "http://www.example.com/7op2wb1/091_1929.jpg",
"Chandrasekhara_Venkata_Raman": "http://www.example.com/1vqqwua/094_1930.jpg"
}
我想以一种有效的方式为每个键找到最短的前缀,该前缀在给定的键中是唯一的。换句话说:在哈希表中仍然可以识别每行具有较短的前缀。对于此哈希表,仍然使行可识别的最短前缀:
h={
"Wilhelm_C": "http://www.example.com/w2xtt4w/001_1901.jpg",
"Hen": "http://www.example.com/apbfksz/004_1902.jpg",
"P": "http://www.example.com/d2cpwj3/007_1902.jpg",
"An": "http://www.example.com/g8sueyg/010_1903.jpg",
"Mar": "http://www.example.com/gfcgur8/013_1903.jpg",
"Lor": "http://www.example.com/dcjiwq8/016_1904.jpg",
"Jos": "http://www.example.com/4a66bc9/019_1906.jpg",
"Ga": "http://www.example.com/xjefgoa/022_1908.jpg",
"Gug": "http://www.example.com/x359w62/025_1909.jpg",
"K": "http://www.example.com/1edm469/028_1909.jpg",
"Johannes_D": "http://www.example.com/31hpue7/031_1910.jpg",
"Wilhelm_W": "http://www.example.com/yel9iff/034_1911.jpg",
"Nil": "http://www.example.com/iezss59/037_1912.jpg",
"Hei": "http://www.example.com/8zl4uj2/040_1913.jpg",
"Max_v": "http://www.example.com/ta3w6rn/043_1914.jpg",
"William_H": "http://www.example.com/u43qn5h/046_1915.jpg",
"William_L": "http://www.example.com/n7gkk6e/049_1915.jpg",
"Charles_G": "http://www.example.com/2kxxroc/052_1917.jpg",
"Max_P": "http://www.example.com/eyw7tm6/055_1918.jpg",
"Johannes_S": "http://www.example.com/gcjcv2p/058_1919.jpg",
"Charles_É": "http://www.example.com/nox3s6o/061_1920.jpg",
"Nie": "http://www.example.com/mo9ga29/064_1922.jpg",
"R": "http://www.example.com/kxq71if/067_1923.jpg",
"Man": "http://www.example.com/2uwhw9y/070_1924.jpg",
"Ja": "http://www.example.com/iwdavip/073_1925.jpg",
"Gus": "http://www.example.com/73mbso2/076_1925.jpg",
"Je": "http://www.example.com/rgugmas/079_1926.jpg",
"Ar": "http://www.example.com/vy7is1v/082_1927.jpg",
"O": "http://www.example.com/3jz5ve8/085_1928.jpg",
"Lou": "http://www.example.com/srvj8d5/088_1929.jpg",
"John": "http://www.example.com/7op2wb1/091_1929.jpg",
"Chan": "http://www.example.com/1vqqwua/094_1930.jpg"
}
我第一次尝试为每个键或词找到最短的前缀:
ret=h.keys.map{|k|
l=1;
while h.keys.select{|z| z=~/^#{Regexp.quote(k[0..l-1])}/}.size != 1 do
l+=1
end
puts "#{k} -> #{k[0..l-1]}"
[k[0..l-1],h[k]]
};
算法和代码远非最佳,事实上,即使在快速机器上它也确实很慢,而且当哈希表大得多时,它就无法使用。
可以获取示例哈希:
require "json"
h=JSON.load(%x{curl http://pastebin.com/raw/Cs0dTJmA})
两种不同方法的快速基准:
X 轴表示输入哈希/数组的大小,Y 轴表示以秒为单位的运行时间。
最终代码,现在是最快的而且不那么占用内存:
def optimize_keys(ih)
h=ih.dup #for not messing with the original
result={}
bh={}
l=1
ks=h.keys.size
while result.size < ks do
h.keys.each{|k|
prefix=k[0..l-1]
bh[prefix]==nil ? bh[prefix]=[1,k] : bh[prefix][0]+=1
}
ones=bh.select{|k,v| v[0]==1}
ones.each{|k,v|
result[k]=h[v[1]]
h.delete(v[1])
}
bh={}
l+=1
end
return result
end
最佳答案
您可以使用 Abbrev.abbrev
生成一个明确的缩写列表。因为它会生成一个以缩写为键、单词为值的散列,我们需要先提取最短的缩写并交换键和值。
require 'abbrev'
word_to_abbr = Abbrev.abbrev(h.keys)
.group_by { |k, v| v }
.map { |k, v| [k, v.flatten.min_by(&:length)] }
.to_h
output = {}
h.each { |k, v| output[word_to_abbr[k].to_s] = v }
output
#=> {
# "Wilhelm_C" => "http://www.example.com/w2xtt4w/001_1901.jpg",
# "Hen" => "http://www.example.com/apbfksz/004_1902.jpg",
# "P" => "http://www.example.com/d2cpwj3/007_1902.jpg",
# "An" => "http://www.example.com/g8sueyg/010_1903.jpg",
# "Mar" => "http://www.example.com/gfcgur8/013_1903.jpg",
# "Lor" => "http://www.example.com/dcjiwq8/016_1904.jpg",
# "Jos" => "http://www.example.com/4a66bc9/019_1906.jpg",
# "Ga" => "http://www.example.com/xjefgoa/022_1908.jpg",
# "Gug" => "http://www.example.com/x359w62/025_1909.jpg",
# "K" => "http://www.example.com/1edm469/028_1909.jpg",
# "Johannes_D" => "http://www.example.com/31hpue7/031_1910.jpg",
# "Wilhelm_W" => "http://www.example.com/yel9iff/034_1911.jpg",
# "Nil" => "http://www.example.com/iezss59/037_1912.jpg",
# "Hei" => "http://www.example.com/8zl4uj2/040_1913.jpg",
# "Max_v" => "http://www.example.com/ta3w6rn/043_1914.jpg",
# "William_H" => "http://www.example.com/u43qn5h/046_1915.jpg",
# "William_L" => "http://www.example.com/n7gkk6e/049_1915.jpg",
# "Charles_G" => "http://www.example.com/2kxxroc/052_1917.jpg",
# "Max_P" => "http://www.example.com/eyw7tm6/055_1918.jpg",
# "Johannes_S" => "http://www.example.com/gcjcv2p/058_1919.jpg",
# "Charles_É" => "http://www.example.com/nox3s6o/061_1920.jpg",
# "Nie" => "http://www.example.com/mo9ga29/064_1922.jpg",
# "R" => "http://www.example.com/kxq71if/067_1923.jpg",
# "Man" => "http://www.example.com/2uwhw9y/070_1924.jpg",
# "Ja" => "http://www.example.com/iwdavip/073_1925.jpg",
# "Gus" => "http://www.example.com/73mbso2/076_1925.jpg",
# "Je" => "http://www.example.com/rgugmas/079_1926.jpg",
# "Ar" => "http://www.example.com/vy7is1v/082_1927.jpg",
# "O" => "http://www.example.com/3jz5ve8/085_1928.jpg",
# "Lou" => "http://www.example.com/srvj8d5/088_1929.jpg",
# "John" => "http://www.example.com/7op2wb1/091_1929.jpg",
# "Chan" => "http://www.example.com/1vqqwua/094_1930.jpg"
# }
关于ruby - 高效的 Ruby 代码,用于为单词中唯一的每个单词找到最短的前缀,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42017555/