--> Document Information


                                             

DATA CLEANING USING FD FROM DATA MINING PROCESS 
Author(s): Kollayut Kaewbuadee, Yaowadee Temtanapat , Ratchata Peachavanish
Paper abstract: Functional Dependency (FD) is an important feature for referencing to the relationship between attributes and candidate keys in tuples. It also shows the relationship between entities in a data model (Calvanese et al. 2001). In research areas of data cleaning (Arenas et al. 1999; Bohannon et al. 2005), the FD is used for improving the data quality. In a data mining research, an FD discovery technique has been studied (Savnik and Flach 1993; Huhtala et al. 1999). However, an FD discovery could find too many FDs and, if use directly in a cleaning process, could cause it to NP time (Bohannon et al. 2005). In this research, we have developed a cleaning engine by combining an FD discovery technique with data cleaning technique and use the feature in query optimization called “Selectivity Value” to decrease the number of discovered FDs. Testing results showed that this work can identify duplicates and anomalies with high recall and low false positive.
Keywords: Functional Dependency, Data Cleaning, Functional Dependency Discovery
Type: Journal Paper  
Full Contents (click to dowload):  
First Page: 117 
Last Page: 131 
Year: 2006  
Editors: Pedro Isaías and Marcin Paprzycki  
ISBN: ISSN: 1646-3692  
Language: English  
Conference Name: IADIS International Journal on Computer Science and Information System  
Volume: V I, 2  

new search -->

If you are a IADIS member click here to login