Python 3 replace part of string with data from dict

Question

I need to turn phonenumbers into international format. I have a list of phone number examples

rows = [
  (datetime.time(20, 35, 30), '0707262078',),
  (datetime.time(20, 38, 18), '+46706332602',),
  (datetime.time(20, 56, 35), '065017063'),
  (datetime.time(21, 45, 1), '+46730522807',),
  (datetime.time(22, 13, 47), '0046733165812')
]

I need to replace all numbers starting with ex 07 with +467, all 06 with +466 and 00 with +. For above example I need the number to turn out 0707262078 to +46707262078, 065017063 to +4665017063 and 0046733165812 to +46733165812.
Dont know if its possible to do this in regex only or if I need to do it with other code.

Been trying with re.sub combined with lamda, my thought is to make a dictionary with the matching replaces like this:

repl_dict = {
  '01': '+461',
  '02': '+462',
  '03': '+463',
  '04': '+464',
  '05': '+465',
  '06': '+466',
  '07': '+467',
  '08': '+468',
  '09': '+469',
  '00': '+'
}

My try so far:

import re
rows = [
  (datetime.time(20, 35, 30), '0707262078',),
  (datetime.time(20, 38, 18), '+46706332602Ring via Mitel ',),
  (datetime.time(20, 56, 35), '065017063'),
  (datetime.time(21, 45, 1), '+46730522807Ring via Mitel ',),
  (datetime.time(22, 13, 47), '0046733165812')
]

repl_dict = {
  '01': '+461',
  '02': '+462',
  '03': '+463',
  '04': '+464',
  '05': '+465',
  '06': '+466',
  '07': '+467',
  '08': '+468',
  '09': '+469',
  '00': '+'
}

for row in rows:
    regex = re.compile(r'^\d{1}[0-9](\d*)'), re.S
    DialedNumber = regex.sub(lambda match: repl_dict.get(match.group(0), row[1]), row[1], row[1])

Can I note that you for loop is not indented properly. Fix that first(it may just be a copying and pasting error). — elijahfhopp
– elijahfhopp, Commented Feb 13, 2018 at 15:07

tobias_k · Accepted Answer · 2018-02-13 19:41:38Z

3

Your regex, ending in \d*, will match the entire number, and hence no entry is found in the dict. Also, there seems to be an unmatched parens and one too many row[1] in the call to sub.

You can simplify your regex to ^00? and your replacements dict to {'00': '+', '0': '+46'}. This will check whether the number starts with either one or two 0, making the replacement dict much simpler and less repetetive.

rows = [(datetime.time(20, 35, 30), '0707262078',), (datetime.time(20, 38, 18), '+46706332602Ring via Mitel ',), (datetime.time(20, 56, 35), '065017063'), (datetime.time(21, 45, 1), '+46730522807Ring via Mitel ',), (datetime.time(22, 13, 47), '0046733165812')]
repl_dict = {'00': '+', '0': '+46'}
regex = re.compile(r'^00?')
for date, number in rows:
    print(regex.sub(lambda match: repl_dict.get(match.group(0)), number))

Output:

+46707262078
+46706332602Ring via Mitel 
+4665017063
+46730522807Ring via Mitel 
+46733165812

If you only want the numeric part, you can pre- or postprocess the numbers with a second regex like [0-9+]*.

edited Feb 13, 2018 at 19:41

answered Feb 13, 2018 at 15:17

tobias_k

83.1k12 gold badges130 silver badges186 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Steven Rumbalski Over a year ago

[(tm, re.sub('^(00|0)', lambda m: {'00': '+', '0': '+46'}[m.group(0)], ph)) for (tm, ph) in rows] (Admittedly making it a one-liner does make it less nice, but you shortened both the pattern and the replacement dictionary so much that I wanted to show it.)

Steven Rumbalski · Accepted Answer · 2018-02-13 16:00:20Z

1

This is the naive approach based on repl_dict as given in your question.

def repl(match): 
    return repl_dict[match.group(0)]

pat = '^(' + '|'.join(repl_dict) + ')'
new_rows = [(tm, re.sub(pat, repl, ph)) for (tm, ph) in rows]

tobias_k's answer gives a better approach by improving your repl_dict and pattern.

edited Feb 13, 2018 at 16:00

answered Feb 13, 2018 at 15:41

Steven Rumbalski

45.7k10 gold badges96 silver badges125 bronze badges

Comments

Srdjan M. · Accepted Answer · 2018-02-13 16:04:10Z

1

Regex: ^[0-9]{2}

Details:

^ Asserts position at start of a line
[] Match a single character present in the list
{n} Matches exactly n times

Python code:

By @tobias_k you can use repl_dict.get(m.group(), m.group()) instead of repl_dict.get(m.group()) or m.group().

regex = re.compile(r'^[0-9]{2}')

for i in range(len(rows)):
    rows[i] = (rows[i][0], regex.sub(lambda m: repl_dict.get(m.group()) or m.group(), rows[i][1]))

Output:

[(datetime.time(20, 35, 30), '+46707262078'), (datetime.time(20, 38, 18), '+46706332602Ring via Mitel '), (datetime.time(20, 56, 35), '+4665017063'), (datetime.time(21, 45, 1), '+46730522807Ring via Mitel '), (datetime.time(22, 13, 47), '+46733165812')]

Code demo

edited Feb 13, 2018 at 16:04

answered Feb 13, 2018 at 15:50

Srdjan M.

3,4253 gold badges17 silver badges35 bronze badges

3 Comments

tobias_k Over a year ago

This might fail if, for whatever reason, there are number starting with a digit other than 0. For example, if the number is "12345", then get returns None and you end up with "345"

Srdjan M. Over a year ago

@tobias_k Fixed it!

tobias_k Over a year ago

You could also use repl_dict.get(m.group(), m.group()), i.e. using the matched group itself as a default.

thayne · Accepted Answer · 2018-02-13 15:14:24Z

0

You can do it without a regex:

for row in rows:
  for repl in repl_dict:
    if row[1].startswith(repl):
      print repl_dict[repl]+row[1][len(repl):]
      break

answered Feb 13, 2018 at 15:14

thayne

1,14812 silver badges26 bronze badges

Collectives™ on Stack Overflow

Python 3 replace part of string with data from dict

4 Answers 4

1 Comment

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related