First, concerning the main task to check whether it has a repeated pattern at the end:
I successfully tested two queries.
It seems there is no free online fiddle site to demonstrate it, but in worked fine on my local system.
Simple variant (may return NULL)
WITH urls(url) AS (
VALUES
('www.b.com/aa/aa/aa'),
('www.b.com/aa/'),
('www.xy.com'),
('www.c.com/aa/bb/aa'),
('www.xyz.com/bc/bc'),
('www.aaa.com/aaa/aaa')
)
SELECT
url,
element_at(filter(split(url, '/'), x -> x <> ''), -1)
=
element_at(filter(split(url, '/'), x -> x <> ''), -2) AS has_repeated_pattern
FROM urls;
Robust variant (always returns true or false)
WITH urls(url) AS (
VALUES
('www.b.com/aa/aa/aa'),
('www.b.com/aa/'),
('www.xy.com'),
('www.c.com/aa/bb/aa'),
('www.xyz.com/bc/bc'),
('www.aaa.com/aaa/aaa')
)
SELECT
url,
COALESCE(element_at(filter(split(url, '/'), x -> x <> ''), -1)
=
element_at(filter(split(url, '/'), x -> x <> ''), -2), false) AS has_repeated_pattern
FROM urls;
So COALESCE here enforces false rather than NULL.
Output of the second query:
| url |
has_repeated_pattern |
www.b.com/aa/aa/aa |
true |
www.b.com/aa/ |
false |
www.xy.com |
false |
www.c.com/aa/bb/aa |
false |
www.xyz.com/bc/bc |
true |
www.aaa.com/aaa/aaa |
true |
Now, concerning the additional task to count the repetitions of repeated patterns at the end of the URLs:
I extended the previous idea to following query:
WITH urls(url) AS (
VALUES
('www.b.com/aa/aa/aa'),
('www.b.com/aa/'),
('www.xy.com'),
('www.c.com/aa/bb/aa'),
('www.xyz.com/bc/bc'),
('www.aaa.com/aaa/aaa')
)
SELECT
url,
cardinality(
filter(
CASE
WHEN cardinality(segs) >= 2 THEN sequence(2, cardinality(segs))
ELSE array[]
END,
i -> segs[i] = segs[i-1]
)
) AS repeated_pattern_repetitions
FROM (
SELECT
url,
filter(split(url, '/'), x -> x <> '') AS segs
FROM urls
) t;
How it works:
split(url, '/') divides each URL into segments.
filter(..., x -> x <> '') removes empty segments (e.g., from trailing slashes).
cardinality(segs) >= 2 ensures we only compare arrays with at least two segments.
sequence(2, cardinality(segs)) generates positions starting from the second segment.
filter(..., i -> segs[i] = segs[i-1]) selects positions where a segment is equal to the previous one.
cardinality(...) counts these matches, giving the total number of repetitions of repeated segments.
For URLs with fewer than two segments, we return an empty array to avoid out-of-bounds errors, which results in a count of zero.
Output of this query:
| url |
repeated_pattern_repetitions |
www.b.com/aa/aa/aa |
2 |
www.b.com/aa/ |
0 |
www.xy.com |
0 |
www.c.com/aa/bb/aa |
0 |
www.xyz.com/bc/bc |
1 |
www.aaa.com/aaa/aaa |
1 |
www.xyz.com/a/b/b/ccount?