My database has information about documents, where each document has a category, e.g.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX: <http://example.com>
:doc1 :hasCategory :category1 .
:category1 rdfs:label "Law" .
There are about 100k statements like this.
Running a simple query to get counts of documents per category:
SELECT ?category (count(distinct ?doc) as ?count) WHERE {
?doc :hasCategory ?category .
} GROUP BY ?category
takes about 0.1s to run.
But to return the category labels as well:
SELECT ?category ?label (count(distinct ?doc) as ?count) WHERE {
?doc :hasCategory ?category .
?category rdfs:label ?label .
} GROUP BY ?category ?label
this query takes more than 7s to run.
Why would the difference be so large, and is there a more optimised query I can use to get the labels?
SELECT ?category (sample(?label) as ?l) (count(distinct ?doc) as ?count) WHERE { ...} GROUP BY ?categorysamplereduced the time taken by a second, but it's still about 6s?doc :hasCategory ?category .is only 133 . I would have imagined that it only has to find labels for 133 categories