rdf - How to write SPARQL query that efficiently matches string literals while ignoring case -


i using jena arq write sparql query against large ontology being read jena tdb in order find types associated concepts based on rdfs label:

select distinct ?type {  ?x <http://www.w3.org/2000/01/rdf-schema#label> "aspirin" .  ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . } 

this works pretty , quite speedy (<1 second). unfortunately, terms, need perform query in case-insensitive way. instance, because label "tylenol" in ontology, not "tylenol", following query comes empty:

select distinct ?type {  ?x <http://www.w3.org/2000/01/rdf-schema#label> "tylenol" .  ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type . } 

i can write case-insensitive version of query using filter syntax so:

select distinct ?type {  ?x <http://www.w3.org/2000/01/rdf-schema#label> ?term .  ?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type .  filter ( regex (str(?term), "tylenol", "i") ) } 

but query takes on minute complete! there way write case-insensitive query in more efficient manner?

the reason query filter query runs slower because ?term unbound requires scanning pso or pos index find statements rdfs:label predicate , filter them against regex. when bound concrete resource (in first example), use ops or pos index scan on statements rdfs:label predicate , specified object resource, have lower cardinality.

the common solution type of text searching problem use external text index. in case, jena provides free text index called larq, uses lucene perform search , joins results rest of query.


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -