1

I have read in product pricing for some products. As you will see below, not every product pricing string is set up the same. What I am trying to do is to parse out the sub-strings I do not want.

Below is the code I have which works, but there has to be a more efficient way to do this.

tmp1 = p_pricing.replace("from ", "")
tmp1 = tmp1.replace("Options Available on Open Box", "")
tmp1 = tmp1.replace("Open Box Price: From ", "")
tmp1 = re.sub(r'\([^)]*\)', '', tmp1)
tmp1 = re.split("[$]", tmp1)

Below is a small sample of my pricing string:

$11.99($6.00 per item)$14.99
from $13.99$18.25
$9.89($4.94 per item)$14.99
from $9.83($3.28 per item) 
from $15.99$29.99
from $84.99$104.95
from $9.83($3.28 per item) 
$3.47
$94.99$129.99
from $14.34$19.90
from $25.01$65.00Options Available on Open Box
0

2 Answers 2

1

It seems you just want to get the numeric values of all prices in each string.

You can use

re.findall(r'\$(\d+(?:\.\d+)?)', text)

See the regex demo.

Details

  • \$ - a $ char
  • (\d+(?:\.\d+)?) - Capturing group 1: one or more digits, and then an optional occurrence of a . and one or more digits.

See the Python demo:

import re
pattern = r"\$(\d+(?:\.\d+)?)"
text = "$11.99($6.00 per item)$14.99\nfrom $13.99$18.25\n$9.89($4.94 per item)$14.99\nfrom $9.83($3.28 per item) \nfrom $15.99$29.99\nfrom $84.99$104.95\nfrom $9.83($3.28 per item) \n$3.47\n$94.99$129.99\nfrom $14.34$19.90\nfrom $25.01$65.00Options Available on Open Box"
print( re.findall(pattern, text) )

Output:

['11.99', '6.00', '14.99', '13.99', '18.25', '9.89', '4.94', '14.99', '9.83', '3.28', '15.99', '29.99', '84.99', '104.95', '9.83', '3.28', '3.47', '94.99', '129.99', '14.34', '19.90', '25.01', '65.00']
Sign up to request clarification or add additional context in comments.

Comments

0

As you are replacing from and opening till closing parenthesis in your code using \([^)]*\) with an empty string, you can get all the prices outside of the parenthesis by matching from an opening parenthesis till a closing parenthesis.

Then use an alternation | and capture what you want to keep.

The digits are in capture group 1.

\([^()]*\)|\$(\d+(?:\.\d+))
  • \([^()]*\) Match from an opening till closing parenthesis
  • | Or
  • \$ Match a dollar sign
  • (\d+(?:\.\d+)) Capture group 1 Match 1+ digits and an optional decimal part

See a regex demo or a Python demo

Example code

import re

pattern = r"\([^()]*\)|\$(\d+(?:\.\d+))"

s = "$11.99($6.00 per item)$14.99 from $13.99$18.25 $9.89($4.94 per item)$14.99 from $9.83($3.28 per item) from $15.99$29.99 from $84.99$104.95 from $9.83($3.28 per item) $3.47 $94.99$129.99 from $14.34$19.90 from $25.01$65.00Options Available on Open Box"

print([s for s in re.findall(pattern, s) if s])

Output

['11.99', '14.99', '13.99', '18.25', '9.89', '14.99', '9.83', '15.99', '29.99', '84.99', '104.95', '9.83', '3.47', '94.99', '129.99', '14.34', '19.90', '25.01', '65.00']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.