1

I was trying to find out the best way to find the specific substring in key value pair using re for the following:

some_string-variable_length/some_no_variable_digit/some_no1_variable_digit/some_string1/some_string2
eg: aba/101/11111/cde/xyz or aaa/111/1119/cde/xzx or ada/21111/5/cxe/yyz

here everything is variable and what I was looking for is something like below in key value pair:

`cde: 2` as there are two entries for cde

cxe: 1 as there is only one cxe

Note: everything is variable here except /. ie cde or cxe or some string will be there exactly after two / in each case

input:aba/101/11111/cde/xyz/blabla
output: cde:xyz/blabla
input: aaa/111/1119/cde/xzx/blabla
output: cde:xzx/blabla
input: aahjdsga/11231/1119/gfts/sjhgdshg/blabla
output: gfts:sjhgdshg/blabla

If you notice here, my key is always the first string after 3rd / and value is always the substring after key

2
  • It's not really clear exactly what you're trying to achieve. Could you please edit your post with specific input data and the expected output from that? Commented May 29, 2020 at 0:20
  • updated input and output example Commented May 29, 2020 at 0:26

3 Answers 3

1

Here are a couple of solutions based on your description that "key is always the first string after 3rd / and value is always the substring after key". The first uses str.split with a maxsplit of 4 to collect everything after the fourth / into the value. The second uses regex to extract the two parts:

inp = ['aba/101/11111/cde/xyz/blabla',
        'aaa/111/1119/cde/xzx/blabla',
        'aahjdsga/11231/1119/gfts/sjhgdshg/blabla'
        ]

for s in inp:
    parts = s.split('/', 4)
    key = parts[3]
    value = parts[4]
    print(f'{key}:{value}')

import re

for s in inp:
    m = re.match(r'^(?:[^/]*/){3}([^/]*)/(.*)$', s)
    if m is not None:
        key = m.group(1)
        value = m.group(2)
        print(f'{key}:{value}')

For both pieces of code the output is

cde:xyz/blabla
cde:xzx/blabla
gfts:sjhgdshg/blabla
Sign up to request clarification or add additional context in comments.

4 Comments

This is exactly what I was looking for using re. Now, if I want to tweak it a bit to have key value pair as cde:2(2 entries of cde, ie count) and gfts:1 (only one entry of gfts). Can we do that?
@Jacob you could push the key values to a list and use a Counter to count them e.g. ideone.com/7Ip5uo
looks like it's giving count based on nno of cde or gfts appearing in there. What I was looking was how many xzx in cde(say count1) and how many xyz in cde(count2) and total = addition of both(count1+count2) so that I can see count 1 coming from first and count2 coming for second and so on.....Is that even possible by regex library?
@Jacob I think this has gone beyond the ability of comments to properly describe the problem. You should ask a new question and include more details as to exactly what output you're after.
0

Others have already posted various regexes; a more broad question — is this problem best solved using a regex? Depending on how the data is formatted overall, it may be better parsed using

  • the .split('/') method on the string; or
  • csv.reader(..., delimiter='/') or csv.DictReader(..., delimiter='/') in the csv module.

Comments

0

Try (?<!\S)[^\s/]*(?:/[^\s/]*){2}/([^\s/]*)

demo


Try new per commnt

(?<!\S)[^\s/]*(?:/[^\s/]*){2}/([^\s/]*)(?:/(\S*))?

demo2

1 Comment

aderd updats anser

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.