0

I have a 60mb CSV with 700000 rows, IMO this is not a huge amount. My machine has 32 GB of memory and doesn't even use 20% of my memory when I watch performance. I tried to build a release build on 64bit and still ran into the out-of-memory exception. Please could someone advise me on what I am doing wrong?

Do I need to transform the data and persist that before training so it's not running conversions? Eventually, I want to train much larger data sets, surely ML.Net should be able to do that? Perhaps I should just switch over to Python.

I'm using .Net 6.0 and Microsoft.ML 3.0.1

MLContext _mlContext;
PredictionEngine<MlProduct, MlProductPrediction> _predictionEngine;
ITransformer _trainedModel;
IDataView _trainingDataView;


_mlContext = new MLContext()
{
    GpuDeviceId = 0,
    FallbackToCpu = false,
};

_trainingDataView = LoadDataFromCSV();

TrainTestData dataSplit = _mlContext.Data.TrainTestSplit(_trainingDataView, testFraction: 0.2);
IDataView trainData = dataSplit.TrainSet;
IDataView testData = dataSplit.TestSet;

var pipeline = _mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "CategoryName", outputColumnName: "Label")
           .Append(_mlContext.Transforms.Text.FeaturizeText("Features", "ProductName"));

var trainingPipeline = pipeline.Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
       .Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

_trainedModel = trainingPipeline.Fit(trainData);

IDataView transformTest = _trainedModel.Transform(testData);

The code throws the below out of memory exception on after a few seconds on line trainingPipeline.Fit(trainData)

System.OutOfMemoryException
  HResult=0x8007000E
  Message=Exception of type 'System.OutOfMemoryException' was thrown.
  Source=Microsoft.ML.Core
  StackTrace:
   at Microsoft.ML.Internal.Utilities.VBufferUtils.CreateDense[T](Int32 length)
   at Microsoft.ML.Trainers.SdcaTrainerBase`3.TrainCore(IChannel ch, RoleMappedData data, LinearModelParameters predictor, Int32 weightSetCount)
   at Microsoft.ML.Trainers.StochasticTrainerBase`2.TrainModelCore(TrainContext context)
   at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Program.<Main>$(String[] args) in C:\Ml.Product.2\Ml.Product.2\Program.cs:line 29

I have simply model like this:

public class MlProduct
{

    [LoadColumn(0)]
    [ColumnName("ProductName")]
    public string ProductName { get; set; }
    [LoadColumn(1)]
    [ColumnName("CategoryName")]
    public string CategoryName { get; set; }
}

public class MlProductPrediction
{
    [ColumnName("PredictedLabel")]
    public string CategoryName;

    [ColumnName("PredictionScore")]
    public float Score { get; set; }
}
3
  • Take a dump on OOM exception and open it with VS. Check memory usage there. I think you may have problem with the string handling. Details: stackoverflow.com/questions/32342392/… Commented Mar 1, 2024 at 10:09
  • This is you? Just asking so i don't need to point you at all the duplicates there that have pretty equal problems. As sidenote a OutOfMemoryException does not necessarily mean you are out of Memory but the system couldn't get the resources you asked for that is connected to memory. It just looks like an Array creation to me in the source so the problem might just be that a consecutive memory range wasn't available even if in sum enough memory is available. Commented Mar 1, 2024 at 11:21
  • Thank Ralf, yes that's correct you don't need to. thanks Let me know if you figure it out. Commented Mar 3, 2024 at 16:39

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.