Ruby 解析字符串

标签 ruby string parsing

我有一个字符串

input = "maybe (this is | that was) some ((nice | ugly) (day |night) | (strange (weather | time)))"

Ruby 中解析该字符串的最佳方法是什么?

我的意思是脚本应该能够像这样构建句子:

maybe this is some ugly night

maybe that was some nice night

maybe this was some strange time

等等,你明白了......

我应该一个字符一个字符地读取字符串并构建一个带有堆栈的状态机来存储括号值以供以后计算,还是有更好的方法?

也许为此目的准备了一个开箱即用的库?

最佳答案

尝试 Treetop 。它是一种类似 Ruby 的 DSL 来描述语法。解析您提供的字符串应该很容易,并且通过使用真正的解析器,您以后可以轻松地扩展语法。

要解析的字符串类型的示例语法(另存为 sentences.treetop ):

grammar Sentences
  rule sentence
    # A sentence is a combination of one or more expressions.
    expression* <Sentence>
  end

  rule expression
    # An expression is either a literal or a parenthesised expression.
    parenthesised / literal
  end

  rule parenthesised
    # A parenthesised expression contains one or more sentences.
    "(" (multiple / sentence) ")" <Parenthesised>
  end

  rule multiple
    # Multiple sentences are delimited by a pipe.
    sentence "|" (multiple / sentence) <Multiple>
  end

  rule literal
    # A literal string contains of word characters (a-z) and/or spaces.
    # Expand the character class to allow other characters too.
    [a-zA-Z ]+ <Literal>
  end
end

上面的语法需要一个附带的文件来定义允许我们访问节点值的类(保存为 sentence_nodes.rb )。

class Sentence < Treetop::Runtime::SyntaxNode
  def combine(a, b)
    return b if a.empty?
    a.inject([]) do |values, val_a|
      values + b.collect { |val_b| val_a + val_b }
    end
  end

  def values
    elements.inject([]) do |values, element|
      combine(values, element.values)
    end
  end
end

class Parenthesised < Treetop::Runtime::SyntaxNode
  def values
    elements[1].values
  end
end

class Multiple < Treetop::Runtime::SyntaxNode
  def values
    elements[0].values + elements[2].values
  end
end

class Literal < Treetop::Runtime::SyntaxNode
  def values
    [text_value]
  end
end

下面的示例程序表明解析您给出的例句非常简单。

require "rubygems"
require "treetop"
require "sentence_nodes"

str = 'maybe (this is|that was) some' +
  ' ((nice|ugly) (day|night)|(strange (weather|time)))'

Treetop.load "sentences"
if sentence = SentencesParser.new.parse(str)
  puts sentence.values
else
  puts "Parse error"
end

这个程序的输出是:

maybe this is some nice day
maybe this is some nice night
maybe this is some ugly day
maybe this is some ugly night
maybe this is some strange weather
maybe this is some strange time
maybe that was some nice day
maybe that was some nice night
maybe that was some ugly day
maybe that was some ugly night
maybe that was some strange weather
maybe that was some strange time

您还可以访问语法树:

p sentence

The output is here .

您已经拥有它:一个可扩展的解析解决方案,它应该非常接近您想要在大约 50 行代码中执行的操作。这有帮助吗?

关于Ruby 解析字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2380020/

相关文章:

ruby-on-rails - 是否有绕过 'can' t 在迭代期间将新 key 添加到哈希中的解决方案 (RuntimeError)'?

java - 从字符串中删除所有没有特定字符的字符?

python - 将许多python变量的列表保存到excel表中,同时保持变量类型的定义?

ios - 从 UITouch 对象调用 UIGestureRecognizer

ruby-on-rails - Rails 路线 :shallow => true

ruby-on-rails - Rails4 - 测试困惑..需要推荐

c# - 文本文件 : Reading line by line C#

java - 如何将字符转换为字节?

c++ - 如何在 C++ 中解析具有不同字段数的行

ruby 出轨