It is fairly easy to convert individual formatted numbers to an int or float. Likewise it is possible to convert a series of formatted numbers using the same locale or number formatting.
It becomes challenging when different conventions are being used in the list.
The following code will work if the formatted numbers are string formatted floats. It will fail if some of the numbers are integers.
It is important to note that locale data varies from libc implementation to libc implementation. For instance many macOS locales do not have a thousands separator defined, and will format numbers without a thousands separator, unless you process the number as currency. I've tried to accommodate macOS formatted numbers in the code.
The following code uses a list comprehension to pass each formatted number and it's separators to a function to convert it to a float:
import unicodedata as ud
import regex
def convert_digits(text, sep = (",", ".")):
nd = regex.compile(r'^[+(-]?\p{Nd}[,.\u066B\u066C\u0020\u2009\u202F\p{Nd}]*[+)-]?$')
tsep, dsep = sep
if nd.match(text):
if tsep:
text = text.replace(tsep, "")
text = ''.join([str(ud.decimal(c, c)) for c in text])
if text[-1] in ["-", "+"]:
text = text[-1] + text[:-1]
if text[0] == "(" and text[-1] == ")":
text = "-" + text[1:-1]
return float(text.replace(dsep, ".")) if dsep != "." else float(text)
return None
def get_separators(n):
t = tuple(dict.fromkeys(regex.sub(r'\d+', '', n)))
if t[0] in ["-", "+", "("]:
t = t[1:]
if t[-1] in ["-", "+", ")"]:
t = t[:-1]
if len(t) == 1:
t = ("", t[0])
return t
numbers = ['1,234.56', '7.890,12', '123 456,789', '1234,56', '1234.56']
result = [convert_digits(n, sep=get_separators(n)) for n in numbers]
print(result)
# [1234.56, 7890.12, 123456.789, 1234.56, 1234.56]
But ideally it is better to track the locale of each segment of data, and process each accordingly.
One benefit of the above code is that it will work with other decimal number systems:
numbers2 = ['-1.234,56', '123 456,789', '๓๔.๕๕', '٣٫١٤١٥٩٢٦٥٣٥٨']
result2 = [convert_digits(n, sep=get_separators(n)) for n in numbers2]
print(result2)
# [-1234.56, 123456.789, 34.55, 3.14159265358]
It should also be able to handle signed numbers:
numbers3 = ['(1.234,56)', '-1.234,56', '+1.234,56', '1.234,56-']
result3 = [convert_digits(n, sep=get_separators(n)) for n in numbers3]
print(result3)
# [-1234.56, -1234.56, 1234.56, -1234.56]