Noise Elimination from teh Web Documents by Using URL paths and Information Redundancy
Kang, Byeong Ho and Kim, Yang sok (2006) Noise Elimination from teh Web Documents by Using URL paths and Information Redundancy. In: The 2006 International Conference on Information & Knowledge Engineering, 26-29 Jun, Las Vegas, US. Preview |
| PDF - Requires a PDF viewer 338Kb |
AbstractNoise data in the Web document significantly affect on the performance of the Web information management system. Many researchers have proposed document structure based noise data elimination methods. In this paper, we propose a different approach that uses a redundant information elimination approach in the Web documents from the same URL path. We propose a redundant word/phrase filtering method for single or multiple tokenizations. We conducted two experiments to examine efficiency and effectiveness of our filtering approaches. Experimental results show that our approach produces a high performance in these two criteria Repository Staff Only: item control page
|