We are using CLR-functions in our ETL-processes to have specific data-conversion and data-checking logic centralized. These functions are rather basic and require no data-access and are deterministic therefor allowing parallellism.
For instance:
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true)]
public static bool check_smallint(string input)
{
string teststring;
try
{
teststring = input.Trim(' ').Replace('-', '0');
if (teststring.Length == 0)
{
teststring = "0";
}
Convert.ToInt16(teststring);
}
catch (NullReferenceException)
{
return true;
}
catch (FormatException)
{
return false;
}
catch (OverflowException)
{
return false;
}
return true;
}
This works fine except for performance. Query's have slowed down considerably, wihich is causing trouble in processing large datasets (millions of rows and more).
Until now we have found no one who really understands the SQL CLR-architecture, but one suggestion we received is that it might be caused by the overhead of creating a new connection or allocating memory for every function-call. So a solution could be connection / memory pooling.
Please don't suggest different solutions, we are already considering them, like inline sql, or a complete different approach. Standard sql-functions are in many cases no option because of the lack of error raising.
PS. We are using SQL 2008R2.