CLR sql server performance

Question

We are using CLR-functions in our ETL-processes to have specific data-conversion and data-checking logic centralized. These functions are rather basic and require no data-access and are deterministic therefor allowing parallellism.

For instance:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]
public static bool check_smallint(string input)
{
    string teststring;
    try
    {
        teststring = input.Trim(' ').Replace('-', '0');
        if (teststring.Length == 0)
        {
            teststring = "0";
        }
        Convert.ToInt16(teststring);
    }
    catch (NullReferenceException)
    {
        return true;
    }
    catch (FormatException)
    {
        return false;
    }
    catch (OverflowException)
    {
        return false;
    }
    return true;
}

This works fine except for performance. Query's have slowed down considerably, wihich is causing trouble in processing large datasets (millions of rows and more).

Until now we have found no one who really understands the SQL CLR-architecture, but one suggestion we received is that it might be caused by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.

Please don't suggest different solutions, we are already considering them, like inline sql, or a complete different approach. Standard sql-functions are in many cases no option because of the lack of error raising.

PS. We are using SQL 2008R2.

Thanks for your comment. We are trying it out right now.

Rudolf van der Heide
– Rudolf van der Heide

2014-10-14 14:20:58 +00:00
Commented Oct 14, 2014 at 14:20 — Rudolf van der Heide
– Rudolf van der Heide, Commented Oct 14, 2014 at 14:20

Adriano Repetti · Accepted Answer · 2014-11-24 12:29:46Z

5

by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.

It's not something you have to worry about on C# side. You're not allocating memory (of course you're allocating strings and stuff you need inside your function, nothing you can pool/reuse). Also connection isn't something you have to worry about.

This works fine except for performance.

Your code is doing something incredibly...EXCEPTIONALLY...slow: throwing exceptions instead of performing checks. An exception is an expansive operation and should be used to handle exceptional situations (just 100/200 records with a null - or invalid - value and it'll slow down a query over 1,000,000 records). Wrong input format or null values in a database column...aren't exceptional (this programming style - exceptions instead of checks - is allowed and even encouraged in other languages like Python. I'd in general avoid it in C#. For sure it's not appropriate here where performance is an issue).

public static bool check_smallint(string input)
{
    if (String.IsNullOrWhiteSpace(input))
        return true;

    short value;
    return Int16.TryParse(input, out value);
}

Note that: String.IsNullOrWhiteSpace(input) will return true for null inputs or strings made only of spaces (replacing your Trim() and NullReferenceException stuff). Everything else (FormatException for input text that is not an integer or a too big number with OverflowException) is handled by Int16.TryParse(). Code is shorter (and slightly faster) for valid inputs but it's many times faster for invalid ones.

edited Nov 24, 2014 at 12:29

answered Oct 13, 2014 at 13:03

Adriano Repetti

67.5k20 gold badges144 silver badges215 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dan Guzman Over a year ago

OUT needs to be specified for the TryParse method value parameter.

Solomon Rutzky Over a year ago

@AdrianoRepetti +1 for the far better approach (i.e. not relying upon exceptions). Just FYI, an additional improvement would be to use the proper SqlTypes types for the parameters instead of the .NET types. I added an answer explaining this in more detail.

Community · Accepted Answer · 2017-05-23 10:29:30Z

4

^{I am making this a separate answer instead of a comment on @Adriano's answer so that it is less likely to be missed (since not everyone reads all of the comments).}

In addition to changing the approach as suggested by @Adriano, you should really be using the appropriate datatypes, found in the System.Data.SqlTypes Namespace, for all input/output parameters and return values. There are some important differences and benefits to using them, such as them all having an .IsNull property. The full list of differences is too much info to put here, but I did document it in the following article: Stairway to SQLCLR Level 5: Development (Using .NET within SQL Server)

Adapting @Adriano's code to use the proper types would give you the following:

public static SqlBoolean check_smallint(SqlString input)
{
    if (input.IsNull)
        return true;

    if (input.Value.Trim() == String.Empty)
        return true;

    short value;
    return Int16.TryParse(input.Value, out value);
}

edited May 23, 2017 at 10:29

CommunityBot

11 silver badge

answered May 19, 2015 at 17:23

Solomon Rutzky

49.3k11 gold badges141 silver badges184 bronze badges

2 Comments

Adriano Repetti Over a year ago

My upvote: I write that code with same sense of plain C# code, I didn't know there is any benefit to use SqlXyz types. Nice to know, I have some HEAVY used functions that will have great benefit of any improvement!

Solomon Rutzky Over a year ago

@AdrianoRepetti Thanks. And I hope that info helps. Also check out the rest of the Stairway to SQLCLR series :). And of course there is always SQL# (which I am the author of).

Collectives™ on Stack Overflow

CLR sql server performance

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related