1

I have a database field of Brazilian CPF numbers and want to check for their validity. These are 11 digit strings which are 9 digits and 2 checksum digits.

I currently implemented the checksum in MS Excel (see below) but I'd like to figure out a way to do it in SQL.

Checksum works as follows: (Hold on tight, this is nuts.)

  • The CPF number is written in the form ABCDEFGHI / JK or directly as ABCDEFGHIJK, where the digits can not all be the same as each other.
  • The J is called 1st digit check of the CPF number.
  • The K is called the 2nd check digit of the CPF number.

First digit (J):

  • Multiply each digit of the first 9 by a constant:
    10*A + 9*B + 8*C + 7*D + 6*E + 5*F + 4*G + 3*H + 2*I

  • Divide this sum by 11 and if the remainder is 0 or 1, J will be 0. If the remainder is >=2, J will be 11 - remainder.

Second digit (K): (Same calculation but including digit J)

  • Multiply each digit of the first 10 by a constant:
    11A + 10B + 9C + 8D + 7E + 6F + 5G + 4H + 3I + 2J

  • Divide this sum by 11 and if the remainder is 0 or 1, K will be 0. If the remainder is >=2, K will be 11 - remainder.

--Implementation in MS Excel--
Assuming the CPF is in A2.
Optimizations here are welcome but not really the point of this question.
Digit J: =IF(MOD(SUM(MID($A2,1,1)*10,MID($A2,2,1)*9,MID($A2,3,1)*8,MID($A2,4,1)*7,MID($A2,5,1)*6,MID($A2,6,1)*5,MID($A2,7,1)*4,MID($A2,8,1)*3,MID($A2,9,1)*2),11)<=1,NUMBERVALUE(LEFT(RIGHT($A2,2),1))=0,NUMBERVALUE(LEFT(RIGHT($A2,2),1))=(11-MOD(SUM(MID($A2,1,1)*10,MID($A2,2,1)*9,MID($A2,3,1)*8,MID($A2,4,1)*7,MID($A2,5,1)*6,MID($A2,6,1)*5,MID($A2,7,1)*4,MID($A2,8,1)*3,MID($A2,9,1)*2),11)))
Digit K: =IF(MOD(SUM(MID($A2,1,1)*11,MID($A2,2,1)*10,MID($A2,3,1)*9,MID($A2,4,1)*8,MID($A2,5,1)*7,MID($A2,6,1)*6,MID($A2,7,1)*5,MID($A2,8,1)*4,MID($A2,9,1)*3,MID($A2,10,1)*2),11)<=1,NUMBERVALUE(LEFT(RIGHT($A2,1),1))=0,NUMBERVALUE(LEFT(RIGHT($A2,1),1))=(11-MOD(SUM(MID($A2,1,1)*11,MID($A2,2,1)*10,MID($A2,3,1)*9,MID($A2,4,1)*8,MID($A2,5,1)*7,MID($A2,6,1)*6,MID($A2,7,1)*5,MID($A2,8,1)*4,MID($A2,9,1)*3,MID($A2,10,1)*2),11)))

1

3 Answers 3

2

My test table:

-- Create a table called CPF
CREATE TABLE CPF(Id integer PRIMARY KEY, No integer);

-- Create few records in this table 
INSERT INTO CPF VALUES(1, 12345678901);

My nested query:

SELECT No, 
(CASE WHEN (J != J2) THEN 'J wrong!' ELSE 'J ok!' END) as Jchk,
(CASE WHEN (K != K2) THEN 'K wrong!' ELSE 'K ok!' END) as Kchk
FROM 
(SELECT No, J, K,
(CASE WHEN MJ < 2 THEN 0 ELSE 11 - MJ END) as J2,
(CASE WHEN MK < 2 THEN 0 ELSE 11 - MK END) as K2
FROM 
(SELECT No, J, K,
MOD(10*A + 9*B + 8*C + 7*D + 6*E + 5*F + 4*G + 3*H + 2*I, 11) as MJ,
MOD(11*A + 10*B + 9*C + 8*D + 7*E + 6*F + 5*G + 4*H + 3*I + 2*J, 11) as MK 
FROM 
 (SELECT
  No,
  substr(to_char(No), 1, 1) as A,
  substr(to_char(No), 2, 1) as B,
  substr(to_char(No), 3, 1) as C,
  substr(to_char(No), 4, 1) as D,
  substr(to_char(No), 5, 1) as E,
  substr(to_char(No), 6, 1) as F,
  substr(to_char(No), 7, 1) as G,
  substr(to_char(No), 8, 1) as H,
  substr(to_char(No), 9, 1) as I,
  substr(to_char(No), 10, 1) as J,
  substr(to_char(No), 11, 1) as K
  FROM CPF)))
  ;
Sign up to request clarification or add additional context in comments.

1 Comment

Added a regex to check for all digits being the same and added a check that the length doesn't exceed 11 chars. (CASE WHEN length(CPF)>11 OR regexp_like (CPF, '^(\d)\1*$') OR (J!=J2) OR (K!=K2) THEN 'INVALID' ELSE 'VALID' END) as CPF_VALID
1

Assuming you have a table with an id primary key column and a cpf column that is NUMBER(9,0) data type then something like:

WITH digits ( id, a, b, c, d, e, f, g, h, i ) AS (
  SELECT id,
         MOD( TRUNC( cpf / 1e8 ), 10 ),
         MOD( TRUNC( cpf / 1e7 ), 10 ),
         MOD( TRUNC( cpf / 1e6 ), 10 ),
         MOD( TRUNC( cpf / 1e5 ), 10 ),
         MOD( TRUNC( cpf / 1e4 ), 10 ),
         MOD( TRUNC( cpf / 1e3 ), 10 ),
         MOD( TRUNC( cpf / 1e2 ), 10 ),
         MOD( TRUNC( cpf / 1e1 ), 10 ),
         MOD( TRUNC( cpf / 1e0 ), 10 )
  FROM   your_table
),
values1 ( id, j, k ) AS (
  SELECT id,
         MOD( 10*A +  9*B +  8*C +  7*D +  6*E +  5*F +  4*G +  3*H + 2*I, 11 ),
         11*A + 10*B +  9*C +  8*D +  7*E +  6*F +  5*G +  4*H + 3*I
  FROM   digits
),
values2 ( id, j, k ) AS (
  SELECT id,
         CASE WHEN j <= 1 THEN 0 ELSE 11 - j END,
         MOD( k + 2 * CASE WHEN j <= 1 THEN 0 ELSE 11 - j END, 11 )
  FROM   values1
)
SELECT id,
       j,
       CASE WHEN k <= 1 THEN 0 ELSE 11 - k END AS k
FROM   values2

1 Comment

First off, thanks for having a go. Looking at this is assumes I'm starting with a 9 digit string and adding the checkdigits. I'm starting with an 11 digit string and comparing.
0

@SAR622: great question and thanks for the algorithm.

Here is a t-SQL solution for SQL Server, just in case. Note that Cadastro de Pessoas Físicas (CPF) numbers can only have 11 digits (pre-panded by zeros), that is they cannot exceed 10^12-1. If you note 14 digit numbers in your dataset, these are likely to be Cadastro Nacional da Pessoa Jurídica (CNPJ) numbers issued to business (or typos or something else). The fake CPF and CNPJ numbers can be generated (in bulk) and validated (individually) here. Also this site provides more info about a business located by its CNPJ (think of it as an implicit CNPJ validation). When validating a CPF number remember to check if the number is in range [0, 10^12-1]. You may need to remove any punctuation symbols and other invalid characters (as users, we tend to make typos).

This input table has top 5 invalid CPF numbers and bottom 4 valid ones:

IF OBJECT_ID('tempdb..#x') IS NOT NULL DROP TABLE #x;
CREATE TABLE #x  (CPF BIGINT default NULL);
INSERT INTO #x (CPF) VALUES (12345678900);
INSERT INTO #x (CPF) VALUES (11);
INSERT INTO #x (CPF) VALUES (1010101010101010);
INSERT INTO #x (CPF) VALUES (11111179011525590);
INSERT INTO #x (CPF) VALUES (-32081397641);
INSERT INTO #x (CPF) VALUES (00000008726210061);
INSERT INTO #x (CPF) VALUES (56000608314);
INSERT INTO #x (CPF) VALUES (73570630706);
INSERT INTO #x (CPF) VALUES (93957133564);

The following t-SQL function modularizes implementation, but will likely be slower than the raw t-SQL that follows. Alternatively, you can create a t-SQL function with a TABLE input/output or a stored procedure.

ALTER FUNCTION fnIsCPF(@n BIGINT) RETURNS INT AS
BEGIN
    DECLARE @isValid BIT = 0;
    IF (@n > 0 AND @n < 100000000000)
    BEGIN
        --Parse out numbers
        DECLARE @a TINYINT = FLOOR( @n / 10000000000)% 10;
        DECLARE @b TINYINT = FLOOR( @n / 1000000000)% 10;
        DECLARE @c TINYINT = FLOOR( @n / 100000000)% 10;
        DECLARE @d TINYINT = FLOOR( @n / 10000000)% 10;
        DECLARE @e TINYINT = FLOOR( @n / 1000000)% 10;
        DECLARE @f TINYINT = FLOOR( @n / 100000)% 10;
        DECLARE @g TINYINT = FLOOR( @n / 10000)% 10;
        DECLARE @h TINYINT = FLOOR( @n / 1000)% 10;
        DECLARE @i TINYINT = FLOOR( @n / 100)% 10;

        DECLARE @j TINYINT =  ISNULL(NULLIF(NULLIF(11-( 10*@a + 9*@b + 8*@c + 7*@d + 6*@e + 5*@f + 4*@g + 3*@h + 2*@i) % 11, 11), 10), 0);
        DECLARE @k TINYINT =  ISNULL(NULLIF(NULLIF(11 - (11*@a +10*@b + 9*@c + 8*@d + 7*@e + 6*@f + 5*@g + 4*@h + 3*@i + 2 * @j)% 11, 11), 10), 0);
        RETURN CASE WHEN @j=FLOOR(@n / 10)% 10 AND @k=FLOOR(@n)% 10 THEN 1 ELSE 0 END
    END;
    RETURN @isValid;
END;

The output is:

SELECT CPF, isValid=dbo.fnIsCPF(CPF) FROM #x

CPF                 isValid
12345678900         0
11                  0
1010101010101010    0
11111179011525590   0
-32081397641        0
8726210061          1
56000608314         1
73570630706         1
93957133564         1

t-SQL for a table:

WITH digits ( CPF, a, b, c, d, e, f, g, h, i ) AS (
  SELECT CPF,
    FLOOR( CPF / 10000000000)% 10,
    FLOOR( CPF / 1000000000)% 10,
    FLOOR( CPF / 100000000)% 10,
    FLOOR( CPF / 10000000)% 10,
    FLOOR( CPF / 1000000)% 10,
    FLOOR( CPF / 100000)% 10,
    FLOOR( CPF / 10000)% 10,
    FLOOR( CPF / 1000)% 10,
    FLOOR( CPF / 100)% 10
  FROM   #x
),
jk ( CPF, j, k ) AS (
  SELECT CPF, ISNULL(NULLIF(NULLIF(11-( 10*A + 9*B + 8*C + 7*D + 6*E + 5*F + 4*G + 3*H + 2*I) % 11, 11), 10), 0),
    11*A +10*B + 9*C + 8*D + 7*E + 6*F + 5*G + 4*H + 3*I
  FROM digits
),
jk2 ( CPF, j, k ) AS (
  SELECT CPF, j, ISNULL(NULLIF(NULLIF(11 - (k + 2 * j)% 11, 11), 10), 0)
  FROM jk
)
SELECT CPF, isValid=CASE WHEN CPF>0 AND CPF<99999999999 AND j=FLOOR( CPF / 10)% 10 AND k=FLOOR( CPF)% 10 THEN 1 ELSE 0 END
FROM jk2

yielding the same output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.