This chapter introduces OntoExtractor, a tool for semi-automatic generation of taxonomy from a set of documents or data sources. The tool generates the taxonomy in a bottom-up fashion: starting from structural analysis of the documents, it generates a set of clusters, which can be refined by a further grouping generated by content analysis. Metadata describing the content of each cluster is automatically generated and analysed by the tool for generating the final taxonomy. A simulation of a tool, based on implicit and explicit voting mechanism, for the maintenance of the taxonomy is also described. The author describes a system that can be used to generate taxonomy from a heterogeneous source of information, using wrappers for converting the original format of the document to a structured one. This way OntoExtractor can virtually generate taxonomy from any source of information just adding the proper wrapper. Moreover, the trust mechanism allows a reliable method for maintaining the taxonomy and for overcoming the unavoidable generation of wrong classes in the taxonomy.
OntoExtractor : a tool for semi-automatic generation and maintenance of taxonomies from semi-structured documents / M. Leida - In: Semantic knowledge management : an ontology-based framework / [a cura di] Antonio Zilli ... [et al.]. - Hershey : Information science reference, 2009. - ISBN 9781605660349.
OntoExtractor : a tool for semi-automatic generation and maintenance of taxonomies from semi-structured documents
M. LeidaPrimo
2009
Abstract
This chapter introduces OntoExtractor, a tool for semi-automatic generation of taxonomy from a set of documents or data sources. The tool generates the taxonomy in a bottom-up fashion: starting from structural analysis of the documents, it generates a set of clusters, which can be refined by a further grouping generated by content analysis. Metadata describing the content of each cluster is automatically generated and analysed by the tool for generating the final taxonomy. A simulation of a tool, based on implicit and explicit voting mechanism, for the maintenance of the taxonomy is also described. The author describes a system that can be used to generate taxonomy from a heterogeneous source of information, using wrappers for converting the original format of the document to a structured one. This way OntoExtractor can virtually generate taxonomy from any source of information just adding the proper wrapper. Moreover, the trust mechanism allows a reliable method for maintaining the taxonomy and for overcoming the unavoidable generation of wrong classes in the taxonomy.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.