XML Documents searching combining structure and keywords similarities

Apichaya Auvattanasombat; Yousuke Watanabe; Haruo Yokota

論文・著書情報

タイトル

和文:
英文:	XML Documents searching combining structure and keywords similarities

著者

和文:	Apichaya Auvattanasombat, 渡辺陽介, 横田治夫.
英文:	Apichaya Auvattanasombat, Yousuke Watanabe, Haruo Yokota.

言語

English

掲載誌/書名

和文:	情報処理学会研究報告
英文:	IPSJ SIG Technical Reports

巻, 号, ページ

Vol. 2013-DBS-157 No. 14 pp. 1-6

出版年月

2013年7月15日

出版者

和文:
英文:

会議名称

和文:	情報処理学会第156回データベースシステム研究会
英文:	The joint workshop of IPSJ SIG-DBS/IPSJ SIG-IFAT/IEICE DE

開催地

和文:	北海道札幌市
英文:	Sapporo, Hokkaido

ファイル

公式リンク

http://id.nii.ac.jp/1001/00094290/

アブストラクト

In recent years, XML has been increasingly become an emerging standard and widely used in many applications. For example, office documents which are more and more popular used at this time, are also stored in multiple parts of XML archive formats. It is known that the structure and content of XML files play different roles depending on kind of documents. Therefore, achievement similarity search of an XML file should base on both structure and content. In previous work, LAX+ is an algorithm for reckoning a similarity value from structure and contents of XML files in the office documents. However, since LAX+ used exactly matching method between corresponding leaves, similar words in the leaf-nodes are considered as different. To solve the problem, we propose to combine LAX+ with keyword similarity in leaf-nodes. We use docx, xlsx and pptx file formats as experimental data set. The evaluation shows that our approach can be used to improve the precision and recall.

Home

各種検索

サポート

T2R2について

関連リンク

論文・著書情報