1

I am using Python 3.5 and I have NumPy, SciPy, and matplotlib installed and imported.

When I try:

# Import the random forest package
from sklearn.ensemble import RandomForestClassifier

# Create the random forest object which will include all the parameters
# for the fit
forest = RandomForestClassifier(n_estimators = 1)

# Fit the training data to the Survived labels and create the decision trees
forest = forest.fit(train_data[0::,1::],train_data[0::,0])

# Take the same decision trees and run it on the test data
output = forest.predict(test_data)

(test_data and train_data are both float arrays) I get the following error:

C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\fixes.py:64: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  if 'order' in inspect.getargspec(np.copy)[0]:
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\base.py:175: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  args, varargs, kw, default = inspect.getargspec(init)
Traceback (most recent call last):
  File "C:/Users/Uri/PycharmProjects/titanic1/fdsg.py", line 54, in <module>
    output = forest.predict(test_data)
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\ensemble\forest.py", line 461, in predict
    X = check_array(X, ensure_2d=False, accept_sparse="csr")
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\validation.py", line 352, in check_array
    _assert_all_finite(array)
  File "C:\Users\Uri\AppData\Local\Programs\Python\Python35-32\lib\site-packages\sklearn\utils\validation.py", line 52, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Process finished with exit code 1
7
  • Does it not import RandomForrestClassifier just fine though? Commented Oct 27, 2015 at 8:31
  • Seems like a warning, not an error. I run into such warnings regularly when some library uses deprecated numpy functions. Or maybe you didn't provide the full stack trace. Commented Oct 27, 2015 at 8:41
  • Exactly, if it imports, you should be fine. Dont worry too much about it. Commented Oct 27, 2015 at 8:49
  • Add your code to the question body by pressing edit and not in the comments. I added the code in your comment for now, update it if necessary. You need to specify if the error is something other than DeprecationWarning (which is a Warning). Commented Oct 27, 2015 at 8:58
  • 1
    ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). is telling you you have invalid values in your data. Commented Oct 27, 2015 at 23:47

1 Answer 1

1
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import Imputer
import numpy as np

X = np.random.randint(0, (2**31)-1, (500, 4)).astype(object)
y = np.random.randint(0, 2, 500)
clf = RandomForestClassifier()
print(X.max())
clf.fit(X, y) # OK
print("First fit OK")

# 1 - First case your data has null values
X[0,0] = np.nan # replaces of of the cells by a null value
#clf.fit(X, y) # gives you the same error

# to solve NAN values you can use the Imputer class:
imp = Imputer(strategy='median')
X_ok = imp.fit_transform(X)
clf.fit(X_ok, y)

# 2 - Second case your data has huge integers
X[0,0] = 2**128 # the same happens if you have a huge integer
#clf.fit(X, y) # gives you the same error
# to solve this you can clip your values to some cap
X_ok = X.clip(-2**63, 2**63) # I used 2**63 for example, but you should realize what makes sense to your application
clf.fit(X_ok, y)
Sign up to request clarification or add additional context in comments.

1 Comment

nice! but i guess his error is due to ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.