1

I have a column named 'market_cap_(in_us_$)' which values are like:

$5.41 
$18,160.50 
$9,038.20 
$8,614.30 
$368.50 
$2,603.80 
$6,701.50 
$8,942.40 

My final goal is to be able to filter based on specific numeric values (for example, > 2000.00).

By reading other questions in this site, I followed the instructions as:

cleaned_data['market_cap_(in_us_$)'].replace( '$', '', regex = True ).astype(float)

However, I receiving the following error

TypeError: replace() got an unexpected keyword argument 'regex'

If I remove the "regex = True" from the replace arguments, I get

ValueError: could not convert string to float: $5.41

So, what should I do?

2
  • What version of pandas are you running? (print pd.__version__) Commented Jun 13, 2014 at 3:47
  • I had the 0.11.0, after your suggestion I updated it to the 0.14.0. Thanks. Commented Jun 17, 2014 at 17:27

2 Answers 2

4

The right regular expression to use is given here, as you want to remove the $ and ,:

In [7]:

df['market_cap_(in_us_$)'].replace('[\$,]', '', regex=True).astype(float)
Out[7]:
0        5.41
1    18160.50
2     9038.20
3     8614.30
4      368.50
5     2603.80
6     6701.50
7     8942.40
Name: market_cap_(in_us_$), dtype: float64

But since you got that keyword argument 'regex' error, you must be using a very old version, and should update.

Sign up to request clarification or add additional context in comments.

Comments

2

The issue is that the $ is a special character in regular expression which means the start of the string, so replacing just the start of the string ends up not replacing anything!

You have to use str.replace on the Series (with the literal $ and ,):

In [11]: s.replace('\$|,', '', regex=True)
Out[11]:
0        5.41
1    18160.50
2     9038.20
3     8614.30
4      368.50
5     2603.80
6     6701.50
7     8942.40
dtype: object

In [12]: s.replace('\$|,', '', regex=True).astype('float64')
Out[12]:
0        5.41
1    18160.50
2     9038.20
3     8614.30
4      368.50
5     2603.80
6     6701.50
7     8942.40
dtype: float64

It may be you want to use whole cents rather than float dollars (removing the literal .):

In [13]: s.replace('\$|,|\.', '', regex=True).astype('int64')
Out[13]:
0        541
1    1816050
2     903820
3     861430
4      36850
5     260380
6     670150
7     894240
dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.