Home >

news ヘルプ

論文・著書情報


タイトル
和文: 
英文:Style-based Similarity Search for Office XML Documents 
著者
和文: 渡辺 陽介, 上垣外 英剛, 横田 治夫.  
英文: Yousuke Watanabe, Hidetaka Kamigaito, Haruo Yokota.  
言語 English 
掲載誌/書名
和文: 
英文:Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services (iiWAS2012) 
巻, 号, ページ         pp. 138-146
出版年月 2012年12月5日 
出版者
和文: 
英文: 
会議名称
和文: 
英文:International Conference on Information Integration and Web-based Applications & Services (iiWAS2012) 
開催地
和文: 
英文:Bali 
DOI https://doi.org/10.1145/2428736.2428761
アブストラクト Recent office documents follow an XML archive format, so they consist of multiple XML files. XML files in office documents include information about page structures and styles such as font, color and position. But, existing text-based search engines do not focus on structure and style of documents. By utilizing them, we can achieve similarity search for office documents based on structures and styles. We propose SOS, a similarity search method based on structures and styles of office documents. To compute a similarity value between office documents, we have to compute similarity values between multiple pairs of XML files in the documents. We also propose LAX+, which is an algorithm to calculate a similarity value for a pair of XML files, by extending existing XML leaf node clustering algorithm. In our experiments, we use docx, xlsx and pptx files and evaluate SOS and LAX+ by precision and recall.
受賞情報 Best Paper Award

©2007 Tokyo Institute of Technology All rights reserved.