1

Regex is not my strong suite but I am currently using this regex (\/[\d]+) to get the id from tictok urls.

https://m.tiktok.com/h5/share/usr/6641141594707361797.html
https://m.tiktok.com/v/6749869095467945218.html
https://www.tiktok.com/embed/6567659045795758085
https://www.tiktok.com/share/user/6567659045795758085
https://www.tiktok.com/trending?shareId=6744531482393545985

I get the id from all the links except the one with the shareId=xxxxxxxxxxx. I modified my original regex to the following below and still not getting the id from the link with shareId=xxxxxxxxxxx. Any help will be greatly appreciated.

(?:|shareId=)(\/[\d]+)
3
  • 1
    Your regex says "a slash, followed by one or more digits". The last example TikTok URL doesn't have a slash before the digit. Your attempt at fixing it still has this problem. Commented Sep 3, 2022 at 1:25
  • @user2974907 please post an answer to your own question, using ceejayoz's hint. stackoverflow.com/help/self-answer Be sure to mention an online fiddle, perhaps this one: regexr.com/6t871 Commented Sep 3, 2022 at 1:30
  • I used ceejayoz hint and fixed it but now its catching one of the digits in one of the names. See regex. regex101.com/r/ppjnOF/1 Commented Sep 3, 2022 at 1:51

5 Answers 5

2

With your shown samples please try following regex. Here is the Online demo for shown regex.

\bhttps?:\/\/(?:m|www)\.tiktok\.com\/.*\b(?:(?:usr|v|embed|user)\/|\?shareId=)(\d+)\b

Explanation: Adding detailed explanation for used regex here.

\bhttps?:\/\/         ##Mentioning word boundary to avoid partial matches for http/https followed by : 2 slashes.
(?:m|www)             ##In a non-capturing group matching m OR www
\.tiktok\.com\/       ##Followed by .tiktok.com
.*\b                  ##Doing greedy match followed by word boundary.
(?:                   ##Starting one non-capturing group here.
  (?:                 ##Starting one more non-capturing group here.
    usr|v|embed|user  ##matching usr OR v OR embed OR user here.
  )                   ##Closing previously opened non-capturing group here.
  \/                  ##Followed by a literal / here.
  |                   ##putting OR condition here.
  \?shareId=          ##matching ?ShareId= here.
)                     ##Closing firstly opened non-capturing group here.
(\d+)                 ##Creating one and only capturing group which has digits in it.
\b                    ##Followed by word boundary here to avoid partial matches.
Sign up to request clarification or add additional context in comments.

1 Comment

@user2974907, could you please do let me know if this solution has worked for you?
1

You can also make the pattern specific for the tiktok urls in the question, and use a capture group for the digits:

https?://(?:m|www)\.tiktok\.com\b\S*?(?:/(?:use?r|v|embed)/|\bshareId=)(\d+)\b

Explanation

  • https?://(?:m|www)\.tiktok\.com\b Match the start of the urls
  • \S*? Match optional non whitespace chars, as few as possible
  • (?:/(?:use?r|v|embed)/|\bshareId=) Match one of /usr/ /user/ /v /embed/ shareId=
  • (\d+) Capture group 1, match 1+ digits
  • \b A word boundary to prevent a partial word match

See a regex demo.

Comments

1

There are working answers given already, but I want to suggest an elegant-looking solution:

([=/][\d]+)

The first set checks for either '=' or '/' symbol. See https://regex101.com/r/9uNJZ1/1

1 Comment

You could write it as [=/]\d+
0

You can use this regex:

'''(?:\/|shareId=)\d+(?!.*\/)'''g

It matches either a slash / or shareId= followed one or more digits.

It then adds an extra check that there are no more slashes on that line - this way slashes in names won't match.

Regex101 link

Comments

0

Okay I used the suggestions given and made the adjustments and got the following regex. see solution and link below. Thanks for all the help.

(?:|shareId=^)\d{19}

https://regex101.com/r/OY8B6h/1

2 Comments

You could write this as \d{19} because shareId=^ will not match
Yeah, @The fourth bird is quite right. You could replace shareId=^ with shareId=ABC, or with DEF -- each of those three is an expression that never appears in the supplied text. The only saving grace is the "match empty string" to the left of the | vbar. There is nothing in that non-matching group that is helping you at all, best to elide it. In general, for any letter X, if you write X^ it is guaranteed to never match, since ^ caret specifically matches at start of string (or start of line).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.