Using apache lucene to search text

12/22/2023

Download and start the JAR file in the command line: This can give you addtional information if you have problems with your index. You can analyze the Lucene index (if stored persistantly) using the luke-with-depth.jar.Why not just use REGEX Filters? Lucene is a really fast search engine, the index lookup ist alot faster then applying the REGEX Filters to every triple.?s text:query (ta:hasLongText 'won?erful' 10). You can use wild card search using “?” (exactly one character), “+” (at least one character), “*” (zero to infinte characters).?s text:query (ta:hasLongText 'wonderful' 10). You can limit the returned results with:.The result consists only of triples containing the searched term “wonderful”. | ta:subject2 | "The Tutorial Academy offers wonderful tips and tricks for programming!" | | ta:subject1 | "The Tutorial Academy is a wonderful place for tutorials!" | " After running this example you should see the following console output: Construct a persistant TDB based dataset to: tdbĬonstruct a persistant lucene index to: luceneIndex If you used persistant storage, you should see the specified folders being created and filled with data.įinally we can query the loaded data with the following code: public static void queryData( Dataset dataset ) ( "Loading finished after " + time + "ms" ) Long finishTime = System.currentTimeMillis() Long startTime = System.currentTimeMillis() Now we load the data.ttl into the created dataset: public static void loadData( Dataset dataset, String file ) Remember to use the full URI and do not abbreviate the prefix like ta:hasLongText. This is a different syntax used e.g. The indexedProperty is the property pointing to the full text you want to index / query. If you set the parameters tdbPath or lucenePath to null, the dataset will be non persistant and kept only in memory. Create new indexed dataset: Insert operations are automatically indexed with luceneĭataset ds = TextDatasetFactory.createLucene( graphDS, luceneDir, entDef ) LuceneDir = new SimpleFSDirectory( new File( lucenePath ) ) ( "Construct a persistant lucene index to: " + lucenePath ) ( "Construct an in-memory lucene index" ) check for in memory or file based (persistant) index GraphDS = TDBFactory.createDataset( tdbPath ) ĮntityDefinition entDef = new EntityDefinition( "uri", "text", ResourceFactory.createProperty( URI, indexedProperty ) ) ( "Construct a persistant TDB based dataset to: " + tdbPath ) Starting of with creating an indexed dataset: public static Dataset createIndexedDataset( String tdbPath, String lucenePath, String indexedProperty ) Ta:subject3 ta:hasLongText "The Tutorial Academy is great!". Ta:subject2 ta:hasLongText "The Tutorial Academy offers wonderful tips and tricks for programming!". Ta:subject1 ta:hasLongText "The Tutorial Academy is a wonderful place for tutorials!". We store it in the root folder of the project. We use a small turtle file called data.ttl to create a Lucene index and perform a simple text search. Using Jena 2.13.0 results in an unknown class exception with the following code, we will resolve this in another post. Create a Maven project and to add the following dependecies to your pom.xml: Otherwise you have to download and integrate many JARs manually.

We recommand to use maven to solve JAR dependencies automatically.

In this tutorial we explain how you can perform a full text search in SPARQL using Apache Lucene and Apache Jena-text. Lucene is a very performant text search engine and can be used to index full text in RDF triples.

0 Comments

Using apache lucene to search text

Leave a Reply.

Author

Archives

Categories