I'm trying to train a custom NER model to recognize 41 entities(the training set has around 6000 lines)
When I try to run the training command provided in the nlp site :
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop
This is the error I'm facing :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at edu.stanford.nlp.optimization.AbstractCachingDiffFunction.ensure(AbstractCachingDiffFunction.java:136)
at edu.stanford.nlp.optimization.AbstractCachingDiffFunction.derivativeAt(AbstractCachingDiffFunction.java:151)
at edu.stanford.nlp.optimization.QNMinimizer.evaluateFunction(QNMinimizer.java:1150)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:898)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:856)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:850)
at edu.stanford.nlp.optimization.QNMinimizer.minimize(QNMinimizer.java:93)
at edu.stanford.nlp.ie.crf.CRFClassifier.trainWeights(CRFClassifier.java:1935)
at edu.stanford.nlp.ie.crf.CRFClassifier.train(CRFClassifier.java:1742)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:785)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.train(AbstractSequenceClassifier.java:756)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3011)
I tried adding -Xmx4096m to my java command to specify the max heap space as 4GB( that is the maximum available space in my machine) but still no luck.
I also tried adding -Xms1024m to specify minimum heap space and yet no different result.
This same command worked flawless without any heap space errors when I tried it to train a model for 20 entities(1500 lines)
Is this heap space related to RAM or the available space?
Should I try training in a machine with more Ram or storage?