10

I'm sort of new to regexs with Ruby, (or I suppose regex in general), but I was wondering if there was a pragmatic way to match a string using an array?

Let me explain, say I have a list of ingredients in this case:

1 1/3 cups all-purpose flour
2 teaspoons ground cinnamon
8 ounces shredded mozzarella cheese

Ultimately I need to split the ingredients into its respective "quantity and measurement" and "ingredient name", so like in the case of 2 teaspoons ground cinnamon, will be split into "8 ounces, and shredded mozzarella cheese.

So Instead of having a hugely long regex like: (cup\w*|teaspoon\w*ounce\w* ....... ), how can I use an array to hold those values outside the regex?


update

I did this (thanks cwninja):

  # I think the all units should be just singular, then 
  # use ruby function to pluralize them.

units = [
  'tablespoon',
  'teaspoon',
  'cup',
  'can',
  'quart',
  'gallon',
  'pinch',
  'pound',
  'pint',
  'fluid ounce',
  'ounce'
  # ... shortened for brevity
]

joined_units = (units.collect{|u| u.pluralize} + units).join('|')

# There are actually many ingredients, so this is actually an iterator
# but for example sake we are going to just show one.
ingredient = "1 (10 ounce) can diced tomatoes and green chilies, undrained"

ingredient.split(/([\d\/\.\s]+(\([^)]+\))?)\s(#{joined_units})?\s?(.*)/i)

This gives me close to what I want, so I think this is the direction I want to go.

puts "measurement: #{arr[1]}"
puts "unit: #{arr[-2] if arr.size > 3}"
puts "title: #{arr[-1].strip}"

2 Answers 2

41

Personally I'd just build the regexp programmatically, you can do:

ingredients = [...]
recipe = Regexp.new(ingredients.join("|"), Regex::IGNORECASE)

or using union method:

recipe = Regexp.union(ingredients)
recipe = /#{regex}/i

… then use the recipe regexp.

As long as you save it and don't keep recreating it, it should be fairly efficient.

Sign up to request clarification or add additional context in comments.

2 Comments

i also use this approach, with a little tweak: Regexp.union(measurements) instead of Regexp.new(measurements.join("|")), same result, much cleaner
This looks good but I believe you mean: recipe = Regexp.new(ingredients.join("|"), true)
3

For an array a, something like this should work:

a.each do |line|
    parts = /^([\d\s\.\/]+)\s+(\w+)\s+(.*)$/.match(line)
    # Do something with parts[1 .. 3]
end

For example:

a = [
    '1 1/3 cups all-purpose flour',
    '2 teaspoons ground cinnamon',
    '8 ounces shredded mozzarella cheese',
    '1.5 liters brandy',
]
puts "amount\tunits\tingredient"
a.each do |line|
    parts = /^([\d\s\.\/]+)\s+(\w+)\s+(.*)$/.match(line)
    puts parts[1 .. 3].join("\t")
end

1 Comment

+ 1 Thanks for your answer, oddly enough your answer is like right on for the dumb way I described my problem, I don't think I was very clear, but your solution is actually really good for the way I described it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.