Proceedings of the 8th International Conference on Semantic Systems - I-SEMANTICS '12
Full text: Download
Existing content management systems (CMSes) usually do not offer flexible, customizable means to create semantic, domain specific indexing and search mechanisms. Therefore, they either do not provide any semantic retrieval, search, browsing functionalities at all on the managed content or the semantic search functionality provided is limited as it depends on the manual annotation of content by users. So, in this study we describe a semantic content management flow by extracting implicit knowledge from both the structure of the CMSes and actual content within them. The task of additional semantic knowledge gathering and providing semantic operations on the content is a challenging task which includes adoption of several latest advancements in information extraction (IE), information retrieval (IR) and Semantic Web areas. In this study, we propose a new approach which provides automatic annotation of content managed in CMSes with the information retrieved from the Linked Open Data (LOD) cloud and several semantic operations on the content in terms of storage and search. We use a simple RDF path language to create custom indexes and retrive semantic knowledge from the LOD cloud suitable for specific use cases. All additional knowledge is materialized along with the actual content of document in dedicated indexes. This semantix indexing infrastructure allows semantically meaningful search facilities on top of it. We realize our approach in the scope of Apache Stanbol project, which is a subproject developed in the scope of IKS project, by focusing on document storage and retrieval. We evaluate our approach in healthcare domain with different domain ontologies (SNOMED/CT, ART, RXNORM) in addition to DBpedia as parts of LOD cloud which are used to annotate documents and content obtained from different health portals.