0

In our system we are hashing passwords with the following method.

public static string HashPassword(string password, byte[] salt)
{
    if (salt == null)
    {
        throw new ArgumentNullException($"{nameof(salt)}");
    }

    var pbkdf2 = new Rfc2898DeriveBytes(
        password,
        salt,
        Iterations,
        HashingAlgorithm
    );

    byte[] hash = pbkdf2.GetBytes(KeySize);

    return $"{Convert.ToBase64String(salt)}{SaltEndMarker}{Convert.ToBase64String(hash)}";
}

For maintenance reasons, we want to set initial hash in the database. As this is a complex algorithm, I thought to use a SQL CLR function, instead of implementing the algorithm directly in SQL. The SQL CLR function approach works, but in comparison to the pure .NET App, is slower in the order of 2-3 magnitudes.

I did some experiments by changing the number of iterations, 1000 takes less than 1 sec, 10K takes 7 sec and 100K 70 sec.

Does anyone have an explanation why the SQL CLR function vs pure .NER App difference? How can it be sped up?

Edited:

  • SQL server version: Microsoft SQL Server 2019
  • I know that the number of iterations should be greater than 10K. I tried with different values to see how the execution time changes.
5
  • I thought to use SQL CLR function why? It's not faster and worse, you've weakened security by transmitting a cleartext password to the database. You also cause issues to the database server as any transactions that involve this function will get blocked waiting for a response. And the memory used will be taken from the database's memory Commented Nov 18 at 12:31
  • this is a complex algorithm, not really, you already used Rfc2898DeriveBytes. Password hashing is meant to be slow anyway, that's what protects against brute force. Nowadays the minimum iterations are 10K, not 1000. As for why you didn't explain which SQL Server version you use, but it will be slower than the current .NET (Core) versions. SQLCLR isn't used because it's faster, it's used to create functionality that can be used in queries, like creating new types, new aggregations and, if it makes sense, new functions. Commented Nov 18 at 12:36
  • Check the Hash passwords in ASP.NET Core docs. The example uses a different class but more importantly, they use HMACSHA256 with 100K iteration. The PasswordHaser class used by ASP.NET Identity goes further - PBKDF2 with HMAC-SHA512, 128-bit salt, 256-bit subkey, 100000 iterations.. Such calculations don't belong to the DB Commented Nov 18 at 12:43
  • 1
    I agree that this is not a good use case for CLR. Regarding why it is slower you may be incurring SQLCLR_QUANTUM_PUNISHMENT waits dba.stackexchange.com/questions/164891/… Commented Nov 18 at 13:05
  • 2
    SQLCLR uses .NET Framework, your app uses .NET Core/8+ probably. WHat I'd like to know is why you are storing it as a base64 string and not varbinary, and why you are storing the salt in the same blob as the hash. Commented Nov 18 at 14:15

1 Answer 1

3

It is because of how SQL CLR scalar function run. When you call a CLR for each password, SQL Server has to go from SQL engine to CLR runtime "every single row". Microsoft actually explain this per-call overhead in their "Performance of CLR Integration Architecture" documentation. But like or small operations it doesn’t really matter, but when you are running 100k iteration per call it will add up.

And worse news is SQL Server also disable multithreading when a scalar user-defined function is used. This User-defined functions doc explain more about that, they just force non-parallel query plans. So it doesn’t matter how many cores you got, your PBKDF2 only run on one thread, but your .NET app will use all of your threads. Kinda makes sense, In the CLR Integration Overview docs they even warn against using it for CPU-heavy stuff inside SQL since it is not made for it (CLR is supposed to be lightweight).

I think running PBKDF2 100k times definitely falls in that 'too heavy' category. I suppose the best fix you can do is run your hash stuff outside of SQL, do them in your .NET app then just store the result in DB, since that’s what SQL is best at, a tool meant to interact with database. If you want those inside SQL, you should use a "table-valued function". Those only run once per set of rows instead of once per row, you should read more in the CLR performance docs. Good luck!

New contributor
Mykal Steele is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.