bringing the world locally
About us
Subscription
Review Committee
Conferences
Publications
FAQ
Contact
 
Members Login
    You are not logged in.
Username


Password



CiiT International Journal of Data Mining Knowledge Engineering
Print: ISSN 0974 – 9683 & Online: ISSN 0974 – 9578

20082009 2010 2011 2012 2013
  January February March April May June July August September October November December


Issue: November 2012
DOI: DMKE112012001
Title: Efficient XML Keyword Search
Authors: Swati Thawari Tonge and Rashmi Phalnikar
Keywords: Keyword Search, XML, Natural Processing, Information Retrieval, Clustering, Relevance Ranking
Abstract:
    
 eXtensible Markup Language (XML) is a semi structured text format which was designed to describe the data using custom tags. Custom tag makes an XML document self-describing so that it is easily understandable by human and machine. XML is now a standard format of data exchange between applications and used in configuration files of enterprise applications. The increasing preference to store and transmit data in the XML format has led to a need for searching these xml documents to retrieve useful information. Xpath and XQuery are powerful structured languages that are used to retrieve information from xml document. But these query languages are complex for non expert user to learn. Complex formats of query language restrict the usage of the xml database. Keyword search allows such user to retrieve information without understanding syntax of complex query language or schema of database. Along with the ease of retrieval of information, keyword search has some challenges like meaningful results, intension of search, keyword ambiguity, enormous results etc. This paper presents efficient keyword search method based on clustering and relevance ranking. Experiment has been conducted to show effectiveness of the proposed method.

Full PDF


Issue: November 2012
DOI: DMKE112012002
Title: Double and Triple Adjacent Errors Detection through Enhanced Hamming Codes
Authors: Subodh Bhoite and S.S. Pawar
Keywords: Radiation-Induced Soft Errors, Geometric Effect of Multiple-Bit Soft Errors Induced By Cosmic Ray, Error Correction Codes (ECCS), Hamming Codes, Memory, Multiple Cell Upsets (MCUS)
Abstract:
      Hamming codes that can correct one error per word are widely used to protect memories or registers from soft errors. The once-ephemeral radiation-induced soft error has become a key threat to advanced commercial electronic components and systems. Left unchallenged, soft errors have the potential for inducing the highest failure rate of all other reliability mechanisms. As technology scales, radiation particles that create soft errors are more likely to affect more than 1 b when they impact a memory or electronic circuit. This effect is known as a multiple cell upset (MCU), and the registers or memory cells affected by an MCU are physically close. To avoid an MCU from causing more than one error in a given word, interleaving is commonly used in memories. With interleaving, cells that belong to the same logical word are placed apart such that an MCU affects multiple bits but on different words. However, interleaving increases the complexity of the memory device and is not suitable for small memories or content-addressable memories. When interleaving is not used, MCUs can cause multiple errors in a word that may not even be detected by a Hamming code. In this paper, a technique to increase the probability of detecting double and triple adjacent errors when Hamming codes are used is presented. The enhanced detection is achieved by placing the bits of the word such that adjacent errors result in a syndrome that does not match that of any single error. Double and triple adjacent errors are precisely the types of errors that an MCU would likely cause, and therefore, the proposed scheme will be useful to provide error detection for MCUs in memory designs.

Full PDF


Issue: November 2012
DOI: DMKE112012003
Title: Two Pass Spam Filter using Origin and Bayesian Approach
Authors: Rashmi Gupta and Nitin Rola
Keywords: Bayesian, Origin of Mail, Text Classification, Spam Filer
Abstract:
    In recent years one cheap and reliable communication medium Email is growing and use of it reached beyond the limit, but it has created one huge problem that is of spam(junk) Email. Solution of this spam is construction of automatic filtering system which eliminates unwanted mails. Bayesian approach is common and efficient for doing this task. Bayesian approach is nothing but casting the problem of removal of Junk Email into decision theoretic framework. At first glance it seems to be simple text classification problem, but right now many researches are going on the same because cost of misclassification of the legitimate to Junk is very high. Here we have considered A Bayesian Approach for filtering Junk Email. A Bayesian Approach is classifying mail by checking its content and it is very time consuming process. So, to improve performance of spam filter here we filter the spam by origin and its content using Bayesian approach.

Full PDF


Issue: November 2012
DOI: DMKE112012004
Title: Prediction of Adverse Drug Reactions due to Drug-Drug Interactions using Probability Analysis
Authors: K.V. Uma, Dr.S. Appavu alias Balamurugan
Keywords: Adverse Drug Reaction, leverage, Chi-square, Association Rules, Decision Tree
Abstract:
     An adverse drug reaction (ADR) is an expression that describes harm associated with the use of given medications at a normal dosage during normal use. ADRs may occur following a single dose or prolonged administration of a drug or result from the combination of two or more drugs. The data on Adverse Drug Reactions are abundant, incomplete and consumes huge space to store. No quantitative conclusions can be drawn from the reported data in regard to mortality, or the underlying causes of ADRs. Hence two drugs namely vioxx and warfarin taken to show that different reactions are caused when the drug is taken in single and in combinations. Association rules are used to find the association between drug and adverse event. Here association rules are generated using probability analysis method .In this method the 2*2 contingency table is constructed for the drugs and the adverse event and the chi-square statistics is found out based on the goodness of fit. With the chi-square test, it is possible to determine only the relative strength of association, not to distinguish the interaction relationships between the drugs. For that Probabilistic model is constructed and based on that adverse event is found out whether due to single drug or drug in combination. Then patient’s demographic information such as gender, age are taken and drug route is also considered to find out if symptom due to drug and with these additional conditions. They are analyzed using ID3 algorithm.

Full PDF


Issue: November 2012
DOI: DMKE112012005
Title: Affinity Propagation Clustering with Background knowledge using Pair wise constraints
Authors: M. Yoga and R. Vadivu
Keywords: Affinity Propagation, Pair wise Constraints, Semi Supervised Learning, Word Pair Process, MMC
Abstract:
     The pairwise constraints specifying whether a pair of samples should be grouped together or not have been successfully incorporated into the conventional clustering methods such as k-means and spectral clustering for the performance enhancement. Nevertheless, the issue of pairwise constraints has not been well studied in the recently proposed MMC (Maximum Margin Clustering), which extends the MMC in supervised learning for clustering and often shows a promising performance. In clustering process, semi-supervised learning is a class of machine learning techniques that make use of small amount of labeled and large amount of unlabeled data for training .This affinity propagation is aimed in making the effective clustering process by performing word wise comparison and overcomes the problem of overlapping that is encountered in K-mean and reduces the memory space required.

Full PDF


Issue: November 2012
DOI: DMKE112012006
Title: Technology for Detecting Heart Failure by Applying Data Mining in Health Care
Authors: K. Pugazharasi and A. Swarnambika
Keywords: Data Mining (DM), Heart Failure (HF), Heart Rate Variability (HRV), Home Monitoring (HM)
Abstract:

     The domain of automatic learning is used in tasks such as medical decision support, protein-protein interaction, extraction of medical knowledge, and for overall patient management care. ML is envisioned as a tool by which computer-based systems can be integrated in the healthcare field in order to get a better, more efficient medical care. This paper describes a ML-based methodology for building an application that is capable of identifying and disseminating healthcare information. It identifies semantic relations that exist between diseases and treatments. This paper tells a platform to enhance effectiveness and efficiency of home monitoring using data mining for early detection of any worsening in patient’s condition. These worsening could require more complex and expensive care if not recognized. Disease management programs, which use no advanced information and computer technology, are as effective as telemedicine but more efficient because less costly. The platform improved home monitoring by adding data mining functionalities. This was important in order to improve home monitoring effectiveness and efficiency, especially benchmarking telemedicine to other disease management programs, and not only to ambulatory follow-up.

Full PDF


Issue: November 2012
DOI: DMKE112012007
Title: Karnaugh Map Model for Mining Association Relationships in Web Content Data: Hypertext
Authors: Vikrant Sabnis, Neelu khare, R.S. Thakur and K.R. Pardasani
Keywords: Karnaugh Map Model, Multilevel Association Rules, Association Relationships, Frequent Text Set
Abstract:

    Web content mining refers to description and discovery of useful information from the web contents/data /documents. Hypertext is one of the most common web content data that has hyperlinks in addition to text. These are modeled with multiple levels of details depending on the application. In this paper Karnaugh map model for multilevel association rule mining has been developed to investigate association relationships among hypertexts of a web site. Karnaugh map model needs single scan of data and stores the information in the form of frequency. Model adopts progressively deepening approach for finding large text sets by utilizing karnaugh map logic for finding frequent text sets at each level of abstraction. Frequent texts sets are generated by the karnaugh map model are used to discover strong association relationships among hypertexts at different levels of abstraction. Further the rules are categorized under three categories and their behavior is studied across the level of abstractions.

Full PDF


Issue: November 2012
DOI: DMKE112012008
Title: A New Framework for Vehicle Number Plate Recognition using Data Mining Techniques
Authors: S. Vydehi and V.B. Maduria
Keywords: Data Mining, K-Nearest Neighbour (KNN), Edge Detection, Image Processing
Abstract:

    Data mining is a field at the intersection of computer science is the process that attempts to discover the patterns in large data sets. One of the key steps in Knowledge Discovery in Databases is to create a suitable target data set for the data mining tasks. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, networks and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. The K-Nearest Neighbour algorithm is amongst the simplest of all machine learning algorithms is proposed in this paper. In this proposed approach, the data mining technique is used for edge detection, extraction of plate region, segmentation of plate characters and recognition of characters. Edge is a basic feature of image. The image edges include rich information that is very significant for obtaining the image characteristic by object recognition. In this paper the Modified Sobel edge detection technique is used to detect the edges of the image. With the help of presented technique in this thesis, can detect the number of any plate just by giving as input the image of the plate and number gets extracted and recognized. Here present simplest of all and with lesser complexity to detect the numbers. The image is stored in the form of a matrix and the output is displayed in the form of detected numbers. Experimental Results are carried out in MATLAB and it has been proven that the data mining technique is more efficient and accurate one compared with other techniques.

Full PDF


Issue: November 2012
DOI: DMKE112012009
Title: A Survey on Similarity Measures for Microarray Gene Expression Data Analysis
Authors: S.P. Vidhya priya and N.S. Nithya
Keywords: Mutual Information, Intuitionistic Fuzzy Sets, Gene Based Clustering, Similarity Measure
Abstract:

    Microarray technology is a present advancement used to concurrently monitor the expression profiles of thousands of genes under different experimental conditions. This paper first momentarily introduce the concepts of microarray technology, survey on similarity measure and discuss the basic elements of clustering on gene expression data. Finding groups of gens with similar expression is usually achieved by exploratory techniques such as cluster analysis. From the detailed survey it mainly concentrates on similarity measure. Similarity measure is important task in gene expression data for clustering technique. In gene expression data two similarity measures are used .Mutual Information similarity measure will be used first and then redundancy can be removed, after that Intuitionistic Fuzzy Sets are used to get more accuracy and it can be applicable for multiple data sets.

Full PDF


Issue: November 2012
DOI: DMKE112012010
Title: A Survey on the Classification of Dark Web using Unclassified Ontology Method
Authors: M. Sreekrishna, B. Chitra and A. Naveenkumar
Keywords: Deep Web, Ontology, Semantic Information Retrieval, Semantic Search, Wikipedia
Abstract:

    The deep web are the web that are not a part of surface web. Due to the large volume of data deep web have grained a large attention in recent years. Traditional search engines cannot be used to retrieve content in the deep Web. Those pages do not exist until they are created dynamically as the result of a specific search. The deep web is found to be large magnitude than the surface web. Further those deep web mostly comprises of online domain specific databases, which are accessed by using web query interfaces. In order to make the extraction relevant to user it is necessary to classify the deep web database. In this paper unclassified ontology based web classification method is used for to classify the data in the deep web. This method involves completely unclassified set of data and uses Wikipedia category network for to analyze the meta-information of the deep web sources. The result of the experiment is found to more accurate and fine-grained classification when compared to the existing approaches.

Full PDF