An integrated architecture for semantic search
thesisposted on 2023-08-30, 14:38 authored by Arooj Fatima
meaningful manner such that software agents can search, reason with and manipulate this data based on an understanding of its semantics. Accessing structured data from Linked Open Datasets currently requires the use of formal query languages (such as SPARQL) which poses significant difficulties for the end users. One way to solve this problem is to provide a Natural Language Interface (NLI) to query semantic data. The author undertook a comprehensive literature survey of existing semantic search tools and performed a critical analysis to identify their strengths and weaknesses. Although some of the existing tools support natural language, they are limited in their techniques for query processing, result ranking, result readability and ease of integration with other search tools. Based upon this analysis, this research proposes a new architecture framework called SIRF (Semantic Information Retrieval Framework) for semantic search to address these shortcomings. This thesis provides a complete overview of the proposed framework, including: the research challenges it addresses; its architecture; the techniques to map user queries to SPARQL queries and to rank domains based on ontology concepts; and the evaluation of the proposed system through a prototype. Evaluation of the prototype demonstrated the validity of the approach. However the quality of resulting queries (and consequently retrieved answers) depended upon the accuracy of the NLP parsers invoked by the prototype. Syntactically well structured NL queries were correctly parsed, yielding better formed SPARQL queries. Less structured NL queries performed poorly. As the framework is not tied to any particular parser, result quality can be improved by utilising better parsers as they become available. The author believes that this work can be employed by a variety of end-user applications that wish to utilise structured data.
InstitutionAnglia Ruskin University
- Accepted version