2

We are using CLR-functions in our ETL-processes to have specific data-conversion and data-checking logic centralized. These functions are rather basic and require no data-access and are deterministic therefor allowing parallellism.

For instance:

[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]
public static bool check_smallint(string input)
{
    string teststring;
    try
    {
        teststring = input.Trim(' ').Replace('-', '0');
        if (teststring.Length == 0)
        {
            teststring = "0";
        }
        Convert.ToInt16(teststring);
    }
    catch (NullReferenceException)
    {
        return true;
    }
    catch (FormatException)
    {
        return false;
    }
    catch (OverflowException)
    {
        return false;
    }
    return true;
}

This works fine except for performance. Query's have slowed down considerably, wihich is causing trouble in processing large datasets (millions of rows and more).

Until now we have found no one who really understands the SQL CLR-architecture, but one suggestion we received is that it might be caused by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.

Please don't suggest different solutions, we are already considering them, like inline sql, or a complete different approach. Standard sql-functions are in many cases no option because of the lack of error raising.

PS. We are using SQL 2008R2.

1
  • Thanks for your comment. We are trying it out right now. Commented Oct 14, 2014 at 14:20

2 Answers 2

5

by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.

It's not something you have to worry about on C# side. You're not allocating memory (of course you're allocating strings and stuff you need inside your function, nothing you can pool/reuse). Also connection isn't something you have to worry about.

This works fine except for performance.

Your code is doing something incredibly...EXCEPTIONALLY...slow: throwing exceptions instead of performing checks. An exception is an expansive operation and should be used to handle exceptional situations (just 100/200 records with a null - or invalid - value and it'll slow down a query over 1,000,000 records). Wrong input format or null values in a database column...aren't exceptional (this programming style - exceptions instead of checks - is allowed and even encouraged in other languages like Python. I'd in general avoid it in C#. For sure it's not appropriate here where performance is an issue).

public static bool check_smallint(string input)
{
    if (String.IsNullOrWhiteSpace(input))
        return true;

    short value;
    return Int16.TryParse(input, out value);
}

Note that: String.IsNullOrWhiteSpace(input) will return true for null inputs or strings made only of spaces (replacing your Trim() and NullReferenceException stuff). Everything else (FormatException for input text that is not an integer or a too big number with OverflowException) is handled by Int16.TryParse(). Code is shorter (and slightly faster) for valid inputs but it's many times faster for invalid ones.

Sign up to request clarification or add additional context in comments.

2 Comments

OUT needs to be specified for the TryParse method value parameter.
@AdrianoRepetti +1 for the far better approach (i.e. not relying upon exceptions). Just FYI, an additional improvement would be to use the proper SqlTypes types for the parameters instead of the .NET types. I added an answer explaining this in more detail.
4

I am making this a separate answer instead of a comment on @Adriano's answer so that it is less likely to be missed (since not everyone reads all of the comments).


In addition to changing the approach as suggested by @Adriano, you should really be using the appropriate datatypes, found in the System.Data.SqlTypes Namespace, for all input/output parameters and return values. There are some important differences and benefits to using them, such as them all having an .IsNull property. The full list of differences is too much info to put here, but I did document it in the following article: Stairway to SQLCLR Level 5: Development (Using .NET within SQL Server)

Adapting @Adriano's code to use the proper types would give you the following:

public static SqlBoolean check_smallint(SqlString input)
{
    if (input.IsNull)
        return true;

    if (input.Value.Trim() == String.Empty)
        return true;

    short value;
    return Int16.TryParse(input.Value, out value);
}

2 Comments

My upvote: I write that code with same sense of plain C# code, I didn't know there is any benefit to use SqlXyz types. Nice to know, I have some HEAVY used functions that will have great benefit of any improvement!
@AdrianoRepetti Thanks. And I hope that info helps. Also check out the rest of the Stairway to SQLCLR series :). And of course there is always SQL# (which I am the author of).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.