Searching and returning a value using regex in Python

Question

I'm trying to write a program to scan videos, find what languages the audio and subtitles are available in, and then use those findings for input.

Currently, I'm generating the output with this:

with open('output.txt', 'wt') as output_f:
    p = subprocess.Popen(command, stdout=output_f, stderr=output_f)

Here's the bit of text from my scan that I need.

  + audio tracks:
    + 1, Japanese (aac) (2.0 ch) (iso639-2: jpn)
  + subtitle tracks:
    + 1, English (iso639-2: eng) (Text)(SSA)

So I need to find out what number is in front of Japanese, but only after it comes after "audio tracks"

Similarly, I need to find what number is in front of English, but only after it comes after "subtitle tracks"

I'm pretty sure I need to use Regular Expressions to do this, but I'm lost on where to begin.

You need to do this in 2 steps: pick out the part of text that show the audio/video tracks with regex, then do a second pass on the smaller part of text to extract information. — nhahtdh
– nhahtdh, Commented Apr 24, 2013 at 6:43
Japanese and English are just examples right? You actually want to find the number in front of the language but after audio tracks: and subtitle tracks:. This shouldn't be a problem, you simply have to do a lookbehind for audio tracks or subtitle tracks or use some groups. — Bakuriu
– Bakuriu, Commented Apr 24, 2013 at 6:51
Subprocess is called because of the way I'm executing the command. No, I need the Japanese language for Audio (or Undefined as the case is sometimes) and I need the English subtitles. The problem stems from the issue of having dual audio and multiple subtitles on some vidoes. — godofgrunts
– godofgrunts, Commented Apr 24, 2013 at 7:06

vlad-ardelean · Accepted Answer · 2013-04-24 06:58:55Z

1

You don't really need a regex here - anyway it seems too complicated to use one of those for me too.

Here's some regular parsing:

with open('output.txt', 'wt') as output_f:
    parseTracks = False
    lines = tuple(output_f)
    for line in lines:
        if 'audio tracks' in line:
            parseTracks = True
        if parseTracks:
            if 'Japanese' in line:
                theNumber = int(''.join([char for char in line if char in '1234567890']))

Same thing with the subtitles.

answered Apr 24, 2013 at 6:58

vlad-ardelean

7,67215 gold badges87 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bakuriu Over a year ago

Replace char in '123456789' with char.isdigit() Also you will take too many digits and so it's still wrong.

godofgrunts Over a year ago

So when I run this code I get the following error: lines = tuple(output_f) io.UnsupportedOperation: not readable

Bakuriu · Accepted Answer · 2013-04-24 06:56:33Z

You could do something like this:

>>> import re
>>> audio_regex = re.compile(r'\+ audio tracks:\n\s*\+ (?P<number>\d+), (?P<lang>\w+)')
>>> subtitle_regex = re.compile(r'\+ subtitle tracks:\n\s*\+ (?P<number>\d+), (?P<lang>\w+)')
>>> text = '''
...   + audio tracks:
...     + 1, Japanese (aac) (2.0 ch) (iso639-2: jpn)
...   + subtitle tracks:
...     + 1, English (iso639-2: eng) (Text)(SSA)
... '''
>>> match = audio_regex.search(text)  #find the first match
>>> match.group('number')
'1'
>>> match.group('lang')
'Japanese'
>>> audio_regex.findall(text)   #find all matches
[('1', 'Japanese')]
>>> subtitle_regex.findall(text)
[('1', 'English')]

Tweak the regexes above to be more or less flexible depending on the format of your file(e.g. if instead of a single space you could have more spaces you can replace the spaces with \s+ to match one or more space.

HennyH · Accepted Answer · 2013-04-24 07:04:20Z

0

This will work (use with .findall()):

(?<=subtitle tracks:\n)\s+\+\s(\d+)
(?<=audio tracks:\n)\s+\+\s(\d+)

Check for a certain prefix (include the newline), then consume the white space and select numbers after a '+'

edited Apr 24, 2013 at 7:04

answered Apr 24, 2013 at 6:55

HennyH

7,9842 gold badges33 silver badges40 bronze badges

Collectives™ on Stack Overflow

Searching and returning a value using regex in Python

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related