0

I need to use a regex to pull a value out a url domain that will exclude everything but the host (ex: wordpress) and domain type (ex .com). The urls are dynamic and contain 2-3 values for each result (www.example.com or example.org). I am trying to use this expression, but I am only getting back the first letter of every item I am attempting to exclude:

Expresssion

(?!wordpress|com|www)(\w+|\d+)

String

example.wordpress.com

Results

  1. example
  2. ordpress
  3. om
  4. Desired Result

example

Any assistance would be greatly appreciated

1
  • 1
    I am really having a hard time understanding you question. What is the pattern for the input, and what is you you want returned as a match for each URL? Commented Jun 2, 2010 at 20:47

3 Answers 3

3

Anchor your regular expression:

\b(?!wordpress|com|www)(\w+|\d+)\b

You might also want to consider whether (\w+|\d+) is really what you mean. \w already includes digits. Also, there are other characters allowed in URLs such as -. Do you need to handle this?

Sign up to request clarification or add additional context in comments.

Comments

0

If I was to do thing like that, I would take advantage of the format of the url: anything (dot) 2nd-level-domain (dot) 1st-level-domain:

^(?<level3>.*)[.]?(?<level2>.+)[.](?<level1>.+)$

Comments

0

Is it so that you are only after what is after the domain part??

(/\/(?!\/).*?\/(.*)/).exec("http://www.google.com/sdfsdf/fdsff")[1]
// returns sdfsdf/fdsff

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.