Get tictok id from Url Regex issue

Question

Regex is not my strong suite but I am currently using this regex (\/[\d]+) to get the id from tictok urls.

https://m.tiktok.com/h5/share/usr/6641141594707361797.html
https://m.tiktok.com/v/6749869095467945218.html
https://www.tiktok.com/embed/6567659045795758085
https://www.tiktok.com/share/user/6567659045795758085
https://www.tiktok.com/trending?shareId=6744531482393545985

I get the id from all the links except the one with the shareId=xxxxxxxxxxx. I modified my original regex to the following below and still not getting the id from the link with shareId=xxxxxxxxxxx. Any help will be greatly appreciated.

(?:|shareId=)(\/[\d]+)

Your regex says "a slash, followed by one or more digits". The last example TikTok URL doesn't have a slash before the digit. Your attempt at fixing it still has this problem. — ceejayoz
– ceejayoz, Commented Sep 3, 2022 at 1:25
@user2974907 please post an answer to your own question, using ceejayoz's hint. stackoverflow.com/help/self-answer Be sure to mention an online fiddle, perhaps this one: regexr.com/6t871 — J_H
– J_H, Commented Sep 3, 2022 at 1:30
I used ceejayoz hint and fixed it but now its catching one of the digits in one of the names. See regex. regex101.com/r/ppjnOF/1 — user2974907
– user2974907, Commented Sep 3, 2022 at 1:51

RavinderSingh13 · Accepted Answer · 2022-09-03 10:31:17Z

2

With your shown samples please try following regex. Here is the Online demo for shown regex.

\bhttps?:\/\/(?:m|www)\.tiktok\.com\/.*\b(?:(?:usr|v|embed|user)\/|\?shareId=)(\d+)\b

Explanation: Adding detailed explanation for used regex here.

\bhttps?:\/\/         ##Mentioning word boundary to avoid partial matches for http/https followed by : 2 slashes.
(?:m|www)             ##In a non-capturing group matching m OR www
\.tiktok\.com\/       ##Followed by .tiktok.com
.*\b                  ##Doing greedy match followed by word boundary.
(?:                   ##Starting one non-capturing group here.
  (?:                 ##Starting one more non-capturing group here.
    usr|v|embed|user  ##matching usr OR v OR embed OR user here.
  )                   ##Closing previously opened non-capturing group here.
  \/                  ##Followed by a literal / here.
  |                   ##putting OR condition here.
  \?shareId=          ##matching ?ShareId= here.
)                     ##Closing firstly opened non-capturing group here.
(\d+)                 ##Creating one and only capturing group which has digits in it.
\b                    ##Followed by word boundary here to avoid partial matches.

edited Sep 3, 2022 at 10:31

answered Sep 3, 2022 at 10:23

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

RavinderSingh13 Over a year ago

@user2974907, could you please do let me know if this solution has worked for you?

The fourth bird · Accepted Answer · 2022-09-03 10:17:25Z

1

You can also make the pattern specific for the tiktok urls in the question, and use a capture group for the digits:

https?://(?:m|www)\.tiktok\.com\b\S*?(?:/(?:use?r|v|embed)/|\bshareId=)(\d+)\b

Explanation

https?://(?:m|www)\.tiktok\.com\b Match the start of the urls
\S*? Match optional non whitespace chars, as few as possible
(?:/(?:use?r|v|embed)/|\bshareId=) Match one of /usr/ /user/ /v /embed/ shareId=
(\d+) Capture group 1, match 1+ digits
\b A word boundary to prevent a partial word match

See a regex demo.

answered Sep 3, 2022 at 10:17

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Comments

dmikon · Accepted Answer · 2022-09-03 02:26:28Z

1

There are working answers given already, but I want to suggest an elegant-looking solution:

([=/][\d]+)

The first set checks for either '=' or '/' symbol. See https://regex101.com/r/9uNJZ1/1

answered Sep 3, 2022 at 2:26

dmikon

213 bronze badges

1 Comment

The fourth bird Over a year ago

You could write it as [=/]\d+

Poul Bak · Accepted Answer · 2022-09-03 02:07:15Z

0

You can use this regex:

'''(?:\/|shareId=)\d+(?!.*\/)'''g

It matches either a slash / or shareId= followed one or more digits.

It then adds an extra check that there are no more slashes on that line - this way slashes in names won't match.

Regex101 link

answered Sep 3, 2022 at 2:07

Poul Bak

11k5 gold badges39 silver badges70 bronze badges

Comments

user2974907 · Accepted Answer · 2022-09-03 02:12:37Z

0

Okay I used the suggestions given and made the adjustments and got the following regex. see solution and link below. Thanks for all the help.

(?:|shareId=^)\d{19}

https://regex101.com/r/OY8B6h/1

answered Sep 3, 2022 at 2:12

user2974907

671 silver badge11 bronze badges

2 Comments

The fourth bird Over a year ago

You could write this as \d{19} because shareId=^ will not match

J_H Over a year ago

Yeah, @The fourth bird is quite right. You could replace shareId=^ with shareId=ABC, or with DEF -- each of those three is an expression that never appears in the supplied text. The only saving grace is the "match empty string" to the left of the | vbar. There is nothing in that non-matching group that is helping you at all, best to elide it. In general, for any letter X, if you write X^ it is guaranteed to never match, since ^ caret specifically matches at start of string (or start of line).

Collectives™ on Stack Overflow

Get tictok id from Url Regex issue

5 Answers 5

1 Comment

Comments

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related