0

How to parse this regex in python?

Here I need to parse the string "Miracle workers" between "From" and "date time stamp" in the efficient way.

    s = """
      business hours. Keyword Search: Sales, Operations, Director, Medical, Medical Devices, DME, Respiratory Equipment, Sales Rep, Account Executive, Exec, Business... <br />
             From Miracle Workers - 26 Apr 2012 08:45:15 GMT
          -  View all <a href="http://www.indeed.com/l-Houston,-TX-jobs.html">Houston    jobs</a>
    """

This is the regex i am doing.I need to get the efficient regex.

    regex1 = re.findall('From\ ([A-Za-z\ ]+)\-',s)
     ['Miracle Workers ']

Extracting another string from url.

  s2 = http://www.indeed.com/job/Region-Manager-Field-Sales-at-Covidien-in-Atlanta,-GA-a1a421aabb4d54a7"
  regex2 = re.findall('-in-([A-Za-z-]+),-([A-Z]{2})',str(job.url))[0]

Here i am getting two tuples like ('Atlanta', 'GA') instead of that Need to get "Atlanta,GA"

How it supposed to do to get the results in effective way in all circumstances?

1 Answer 1

1

Using () your are grouping results, this way, findall give you a tuple. Try this regexp (without grouping):

regexp = '-in-[A-Za-z-]+,-[A-Z]{2}'
Sign up to request clarification or add additional context in comments.

2 Comments

'-in-Atlanta,-GA' output is getting like that.fine.Is it fine my regex for the first string?Will it work for all circumstances?
That string will be always in english? In that case I think so, but better than [A-Za-z\ ], use [A-Za-z\s]+ or at least [A-Za-z\ \t], or [\w\ \t]+ (it depends on the expected input)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.