2

I have a database of dogs. Each dog has a sire and dam parent. I am not able to use CTE for this due to an issue with phpmyadmin and MariaDB 10.0.

I updated to MariaDB 10.2.20 to use CTE. Still getting "Unrecognized Statement type. (near WITH) in phpMyAdmin

Table is animal

Columns I am working with are: id, akc_reg_num, akc_parent_sire, akc_parent_dam

I have tried to get just one generation as I would for loading subcategories, but this is only one side when there are two parents.

I have attempted this solution, but can't get my head wrapped around the two parents of each sibling. How do I select only one generation of a hierarchical tree using an table parent child relation with SQL?

Example Table

CREATE TABLE  `animal` ( 
    `id` INT(11) NOT NULL AUTO_INCREMENT ,
     `akc_reg_num` VARCHAR(20) NOT NULL ,
     `akc_parent_sire` VARCHAR(20) NOT NULL ,
     `akc_parent_dam` VARCHAR(20) NOT NULL ,
     PRIMARY KEY (`id`)
) ENGINE = MyISAM;

INSERT INTO `animal` (`id`, `akc_reg_num`, `akc_parent_sire`, `akc_parent_dam`) VALUES
(NULL, '1', '2', '3'), 
(NULL, '2', '5', '6'), 
(NULL, '3', '9', ''), 
(NULL, '5', '', ''), 
(NULL, '6', '7', '8'), 
(NULL, '7', '', ''), 
(NULL, '8', '', ''), 
(NULL, '9', '10', '11'), 
(NULL, '10', '', ''), 
(NULL, '11', '12', ''), 
(NULL, '12', '', '');

Code:

include_once("db_conx.php");   

function getPedigree($node) { 
    // look up the parent of this node  
    $sql =  'SELECT akc_parent_sire, akc_parent_dam FROM animals WHERE akc_reg_num="'.$node.'";';
    $query = $db->prepare($sql);          
    $query->execute();
    $path = array();    
    while($row=$query->fetch(PDO::FETCH_ASSOC)){    
        if ($row['akc_parent_sire']!='') { 
            $path[] = $row['akc_parent_sire']; 
            echo $row['akc_parent_sire'];
            $path = array_merge(getPedigree($row['akc_parent_sire']), $path); 
        } 
        if ($row['akc_parent_dam']!='') { 
            $path[] = $row['akc_parent_dam']; 
            echo $row['akc_parent_dam'];
            $path = array_merge(getPedigree($row['akc_parent_dam']), $path); 
        } 
    }       
    return $path; 
} 
print_r(getPedigree('vvv'));

I need to loop through each generation so I can return a json array and then use javascript to tie the results to DOM elements. I only need to query for 4 generations however my concern is the expense of cpu cycles. Once I have a few hundred thousand animals in the database how efficient will this same query be?

8
  • 1
    Possible duplicate of How to create a MySQL hierarchical recursive query Commented Mar 24, 2019 at 19:07
  • Thanks Raymond. I did in fact attempt this last night but it only has one parent in the row. I have two parents in the row. Commented Mar 24, 2019 at 19:43
  • Graph Databases exist for a reason. Commented Mar 24, 2019 at 22:14
  • @user1956040 You have not specified your EXACT desired output for your sample data. Commented Mar 24, 2019 at 23:33
  • The way I am approaching this is as I move up the generations I can add onto the json array with where I am with other needed demographics for the DOM. For example I start off with offspring but move to the sire and dam. I will use 's' for sire and 'd' for dam . 2nd generation I will identify the location/breadcrumbs with an 's' and 'd'. 3rd generation will amend their sex to the breadcrumb and 'ss' and 'sd' for those related to the sire and 'ds' and 'dd' for those relations to the dam. 5th generation goes from 'ssss' ... 'dddd', 16 possible combinations in that generation. Commented Mar 24, 2019 at 23:57

1 Answer 1

2

To prevent abusing your database with iterated calls, SELECT the whole table just once and let php do all of the recursive work on the result set.

AMENDMENT: Since collecting the ~100,000 rows is too much heavy lifting, here is alternative advice... Rather than making up to 31 individual trips to the database within a recursive process, I'll recommend that you build a filtered array based on up to 5 trips to the database.

The following snippet is NOT tested:

$generation = 1;
$needles = [1];
$animals = [];
while ($needles && $generation < 6) {
    $sth = $db->prepare("SELECT * FROM animals WHERE akc_reg_num IN (" . implode(',', array_fill(0, count($needles), '?')) . ")");
    $sth->execute($needles);
    if ($results = $sth->fetchAll(\PDO::FETCH_ASSOC)) {
        $needles = array_filter(array_merge(array_column($results, 'akc_parent_sire'), array_column($results, 'akc_parent_dam')));
        $animals[] = array_merge($animal, $results);
    } else {
        $needles = null;
    }
    ++$generation;
}
// $animals is ready to pass to the php recursion

From an $animals result set like this:

$animals = [
    ['id' => 1, 'akc_reg_num' => 1, 'akc_parent_sire' => 2, 'akc_parent_dam' => 3],
    ['id' => 2, 'akc_reg_num' => 2, 'akc_parent_sire' => 5, 'akc_parent_dam' => 6],
    ['id' => 3, 'akc_reg_num' => 3, 'akc_parent_sire' => 9, 'akc_parent_dam' => 0],
    ['id' => 4, 'akc_reg_num' => 5, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 5, 'akc_reg_num' => 6, 'akc_parent_sire' => 7, 'akc_parent_dam' => 8],
    ['id' => 6, 'akc_reg_num' => 7, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 7, 'akc_reg_num' => 8, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 8, 'akc_reg_num' => 9, 'akc_parent_sire' => 10, 'akc_parent_dam' => 11],
    ['id' => 9, 'akc_reg_num' => 10, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0],
    ['id' => 10, 'akc_reg_num' => 11, 'akc_parent_sire' => 12, 'akc_parent_dam' => 0],
    ['id' => 11, 'akc_reg_num' => 12, 'akc_parent_sire' => 0, 'akc_parent_dam' => 0]
];

Itemized tasks:

  • Search your array for the row of the targeted akc_reg_num, then remove that row from the "haystack" to prevent the possibility of infinite recursion, then break the search loop for best efficiency
  • If there is no matching akc_reg_num in the haystack, return an empty array
  • If there is a matching akc_reg_num, loop the haystack and recurse all found parents. I am filtering out "deadends" to keep the result array small and clean.
  • If both parents are found in a given generation, break the loop to prevent needless iterations.
  • The recursion should continue until the generation count exceeds 4 or there are no more parents to collect.

Code: (Demo)

function buildPedigree($haystack, $akc_reg_num, $generation = 0) {
    ++$generation;
    foreach ($haystack as $index => $row) {
        if ($row['akc_reg_num'] == $akc_reg_num) {
            $result = ['sire' => $row['akc_parent_sire'], 'dam' => $row['akc_parent_dam']];
            unset($haystack[$index]);             // reduce the haystack to improve efficiency and avoid infinite loop
            break;                                // stop searching
        }
    }
    if (!isset($result)) {
        return [];  // $akc_reg_num not found
    }

    foreach ($haystack as $row) {
        if ($row['akc_reg_num'] == $result['sire']) {
            $result['sire_parents'] = array_filter(buildPedigree($haystack, $row['akc_reg_num'], $generation));  // recurse and purge empty parent arrays
            if (array_key_exists('dam_parents', $result)) {
                break;  // both parents found in generation, stop this loop
            }
        } elseif ($row['akc_reg_num'] == $result['dam']) {
            $result['dam_parents'] = array_filter(buildPedigree($haystack, $row['akc_reg_num'], $generation));  // recurse and purge empty parent arrays
            if (array_key_exists('sire_parents', $result)) {
                break;  // both parents found in generation, stop this loop
            }
        }
    }
    return $generation <= 4 ? $result : [];
} 

var_export(buildPedigree($animals, 1));

Output:

array (
    'sire' => 2,
    'dam' => 3,
    'sire_parents' => array (
        'sire' => 5,
        'dam' => 6,
        'dam_parents' => array (
            'sire' => 7,
            'dam' => 8,
        ),
    ),
    'dam_parents' => array (
        'sire' => 9,
        'sire_parents' => array (
            'sire' => 10,
            'dam' => 11,
            'dam_parents' => array (
                'sire' => 12,
            ),
        ),
    ),
)
Sign up to request clarification or add additional context in comments.

17 Comments

That is the output I was looking for. I am not able to select * from animal to create a result set due to the fact I will soon have over 100k animals. When does it make sense to just hit the sql server with multiple calls rather then create a large result set? I have about 8 demographics with each of these dogs. Also I am not able to change from varchar to int due to real registration numbers having alphanumeric characters. Just wanted to keep it simple.
A full pedigree will have 30 dogs.
I think if you expect to be generating this 4-gen data every time a user loads a page, you are going to be better served by creating a separate table called pedigree containing only id (which is animal.id) and pedigree (which is the json string containing the generated 4-gen array). This can be a procedure that is run as soon as a new dog hits your system and never again because a dog can't change its ancestory. I don't currently have time to refactor my answer to use recursive queries, let me know if you get stuck trying to refactor it.
If you get a new dog that belongs to a pre-existing pedigree "chain", you'll just need to repair the existing affiliated pedigree chains.
Done and done. Brilliant on the five queries. It actually helps with all of the back end work I need to do and easier on the CPU.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.