0

I have a C# CLR doing multiple string manipulations. It returns an int. I'm calling it with

SELECT func(col1, col2) AS Score
INTO #A
FROM table;

SELECT *
INTO #B
FROM #A
WHERE Score > -1

If I do something like

SELECT func(col1, col2) AS Score
INTO #A
FROM table
WHERE func(col1, col2) > -1;

Is the CLR called and executed two times? In terms of performance/speed is there a better way to get the same result?

col1 is nvarchar(max), col2 is nvarchar(800). The function contains business logic.

There are about 10 billion rows/calculations

9
  • Have you considered a CTE/derived table? Commented Aug 20, 2022 at 16:53
  • Will do tomorrow... Is this better for perfornance that writing to a temp table? Commented Aug 20, 2022 at 17:20
  • is the clr called and executed two times? No, Generally speaking, it is "better" to not store values that can be calculated in a temp table. Without knowing the calculation and why you chose this path and what will use the rows in this temp table it is difficult to give useful answers - and the reference to "col1, col2" as parameters suggest there is more to this than what you post. Commented Aug 20, 2022 at 17:28
  • @SMor there are 4 columns and the Score column. Calculations depends only on col1 and col2 Commented Aug 20, 2022 at 18:08
  • You can always inspect the execution plan to see what is happening... but as others have commented you can't be sure the plan won't change over time. Commented Aug 20, 2022 at 21:53

4 Answers 4

1

If you can reasonably put the IsDeterministic and IsPrecise properties on your function (very sure about IsDeterministic, pretty sure about IsPrecise'; I'm unable to find the relevant documentation right now on what the requirements are), then you can add a computed column to your table that is defined as func(col1, col2) and index it. The act of indexing it will make it so that the function won't be called at query time, but rather when rows are inserted/updated. My recommendation is to try adding the computed column and indexing it on a small version of your table before doing it live. That is:

  • select top(100) * into dbo.TestTable from dbo.YourTable;
  • alter table dbo.TestTable add NewColumn as dbo.func(col1, col2);
  • create index FuncIndex on dbo.TestTable (NewColumn);

And/or, if you have a non-production environment, do it on the live table there.

If predicates for likely queries on that column make sense to be filtered, you can make the index filtered. But that's a general indexing concern and not specific to your situation.

Sign up to request clarification or add additional context in comments.

Comments

0

You should not assume that it will be run only once. It may be, but the behavior may be plan-dependent, and later devs will have the same question. Instead, as @Larnu suggests, push function into a subquery/cte.

with q as
(
  SELECT func(col1, col2) AS Score
  FROM table
),
select *
INTO #A
from q
WHERE Score > -1;

1 Comment

Tested it... Took 1.7 times longer :/
0

If your function is expensive and the number of unique pairs (col1, col2) is significantly less than the number of input records, try using this:

select B.score 
from 
(
    select func(A.col1,A.col2) as score
    from 
    (
        select distinct col1, col2 from table 
    ) A
    where func(A.col1,A.col2) > -1
) B
where B.score > -1

I think whether the function is deterministic or not - can be important when creating a query execution plan by the engine.

I know from experience that (with large tables) temporary tables can be a faster solution than subqueries and CTE's, you can try rewrite proposed query using #temptables.

1 Comment

Thank you for the answer! Unfortunatelly every combination of the two columns is unique
0

I wrote simple function taking two parameters and returning an integer. The function writes to external text file that was called with specified parameters, I was able to track how many times it was called.

Conclusion is: total number of calls is equal to sum of number of input records and number of records that meet criteria specified in WHERE clause.

SELECT don't know what WHERE is doing. Deterministic / Nondeterministic function flags doesn't change anything.

Of course - it's just a simple exercise, and I'm not an outstanding/certified Microsoft systems theorist.

I'm afraid - if you are not able to rewrite your CLR - you won't get much more, though, maybe parallel queries could speed things up ?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.