0

I'm getting a OutOfMemoryException at

if (Regex.IsMatch(output, @"^\d"))

But I'm unsure of what's causing it, my program had been running for like 4 minute. Reading text files (a lot of them). Bulk inserting them into SQL. The output string at the time contained nothing special, a small text read from a .txt file.

I'm assuming this is happening because of the amount of times it needs to regex check, after 4 minute it was in the million times. Is there a way to prevent the Memory problem? dispose or clear before I start looping? If so how do you that?

EDIT: I'm not reading a big file, I'm reading a lot of files. At the time it failed it was around 6666~ files it already read (5 folders) but it needs to read 60 folders in total -> 80.361 .txt files

EDIT: Added the source code. Hoping to clarify

UPDATE:

added: static void DisposeAll(IEnumerable set)

static void DisposeAll(IEnumerable set)
{
    foreach (Object obj in set)
    {
        IDisposable disp = obj as IDisposable;
        if (disp != null) { disp.Dispose(); }
    }
}

And I'm executing this at the end of each loop of a folder.

DisposeAll(ListExtraInfo);
DisposeAll(ListFouten);
ListFouten.Clear();
ListExtraInfo.Clear();

Error placement changed, no longer the Regex but ListFouten is causing it now. Still happening at around 6666 .txt files read.

Exception of type 'System.OutOfMemoryException' was thrown.

static void Main(string[] args)
        {
            string pathMMAP = @"G:\HLE13\Resultaten\MMAP";
            string[] entriesMMAP = Directory.GetDirectories(pathMMAP);
            List<string> treinNamen = new List<string>();

            foreach (string path in entriesMMAP)
            {
                string TreinNaam = new DirectoryInfo(path).Name;
                treinNamen.Add(TreinNaam);
                int IdTrein = 0;
                ListExtraInfo = new List<extraInfo>();
                ListFouten = new List<fouten>();
                readData(TreinNaam, IdTrein, path);
             }
        }


        static void readData(string TreinNaam, int IdTrein, string path)
        {
            using (SqlConnection sourceConnection = new SqlConnection(GetConnectionString()))
            {
                sourceConnection.Open();


                try
                {
                    SqlCommand commandRowCount = new SqlCommand(
                 "SELECT TreinId FROM TestDatabase.dbo.Treinen where Name = " + TreinNaam,
                 sourceConnection);
                    IdTrein = Convert.ToInt16(commandRowCount.ExecuteScalar());

                }
                catch (Exception ex)
                {


                }

            }

            string[] entriesTreinen = Directory.GetDirectories(path);
            foreach (string rapport in entriesTreinen)
            {

                string RapportNaam = new DirectoryInfo(rapport).Name;
                FileInfo fileData = new System.IO.FileInfo(rapport);

                leesTxt(rapport, TreinNaam, GetConnectionString(), IdTrein);

            }
        }
        public static string datum;
        public static string tijd;
        public static string foutcode;
        public static string absentOfPresent;
        public static string teller;
        public static string omschrijving;
        public static List<fouten> ListFouten;
        public static List<extraInfo> ListExtraInfo;
        public static string textname;
        public static int referentieGetal = 0;


        static void leesTxt(string rapport, string TreinNaam, string myConnection, int TreinId)
        {
            foreach (string textFilePath in Directory.EnumerateFiles(rapport, "*.txt"))
            {

                textname = Path.GetFileName(textFilePath);
                textname = textname.Substring(0, textname.Length - 4);

                using (StreamReader r = new StreamReader(textFilePath))
                {
                    for (int x = 0; x <= 10; x++)
                        r.ReadLine();

                    string output;

                    Regex compiledRegex = new Regex(@"^\d", RegexOptions.Compiled);
                    string[] info = new string[] { };
                    string[] datumTijdelijk = new string[] { };

                    while (true)
                    {

                        output = r.ReadLine();
                        if (output == null)
                            break;


                        if (compiledRegex.IsMatch(output))
                        {
                            info = output.Split(' ');
                            int kolom = 6;
                            datum = info[0];
                            datumTijdelijk = datum.Split(new[] { '/' });


                            try
                            {
                                datum = string.Format("{2}/{1}/{0}", datumTijdelijk);
                                tijd = info[1];
                                foutcode = info[2];
                                absentOfPresent = info[4];
                                teller = info[5];
                                omschrijving = info[6];
                            }
                            catch (Exception ex)
                            {

                            }


                            while (kolom < info.Count() - 1)
                            {
                                kolom++;
                                omschrijving = omschrijving + " " + info[kolom];
                            }
                            referentieGetal++;


                            ListFouten.Add(new fouten { Date = datum, Time = tijd, Description = omschrijving, ErrorCode = foutcode, Module = textname, Name = TreinNaam, TreinId = TreinId, FoutId = referentieGetal });

                        }


                        if (output == string.Empty)
                        {
                            output = " ";
                        }
                        if (Char.IsLetter(output[0]))
                        {
                            ListExtraInfo.Add(new extraInfo { Values = output, FoutId = referentieGetal });
                        }

                    }

                }

            }

        }
15
  • 5
    Empty catch blocks are evil Commented Apr 13, 2016 at 12:26
  • Probably how many text files is it reading ? I know that ^\d checks for a digit at the beginning of string. So is that what you are trying to match ? Commented Apr 13, 2016 at 12:28
  • BTW, you can use textname = Path.GetFileNameWithoutExtension(textFilePath); Commented Apr 13, 2016 at 12:28
  • 2
    My bet is on ListFouten getting huge. Try a LinkedList, compile in x64, or use the real solution: avoid loading that many data at once. Commented Apr 13, 2016 at 12:38
  • 1
    @Goostrabha when you exceed the allocated capacity of the array used internally by a list, a new one (twice the size) is allocated, then all elements are copied. At some point, you can't allocate a contiguous memory block that would be big enough. A linked list allocates small chunks of memory, it will consume more memory overall but doesn't need the allocated memory to be contiguous. Commented Apr 13, 2016 at 12:51

2 Answers 2

1

This issue is not for the fault of the regex operations, for the true fault lies in the data which is ultimately being stored around the regex processing.

The analogy is driving a car and saying "It ran out of gas while I had the radio on". It is not the radio's fault...

I recommend that you identify why such copious amounts of data are being stored and resolve that.


There are better ways of processing and analyzing information than throwing everything in memory. I believe that you will need to rewrite the logic to achieve the end goal.

Why are you collecting, and more importantly saving information about every line of 6000+ files? That might be the real issue here....


Otherwise be proactive with these steps

Sign up to request clarification or add additional context in comments.

Comments

1

It could be because your code is re-compiling the regular expression every time it is used? Try using a compiled Regex transform instead. Outside your foreach loop, store a compiled Regex variable:

Regex compiledRegex = new Regex(@"^\d", RegexOptions.Compiled);

Then, when checking for the match, use:

if (compiledRegex.IsMatch(output))

Edit: this answer is not valid. Though the Regex documentation here states that Regex expressions encountered in instance methods would be recompiled, this is not the case: they are cached.

5 Comments

Nope, the Regex class uses a cache to avoid recompilation.
Hmm. The documentation says "The cache stores regular expression patterns that are used only in static method calls. (Regular expression patterns supplied to instance methods are not cached.)". If this is an instance method, it could be not using the cache.
Well, he is using a static method. But the doc is sadly wrong here, see the source code, and also this comment
I ran into an out of memory error with a compiled regex, attempting to match a complex expression against an 18 megabyte string. (Don't ask.)
Maybe we're missing some context here. The link to the source code shows that the Regex cache has 15 entries. If the full code has more regexes, the cache will be useless and much memory will be consumed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.