CiiT International Journal of Data Mining Knowledge Engineering
Print: ISSN 0974 – 9683 & Online: ISSN 0974 – 9578

20082009 2010 2011 2012 2013
April May June July August September October November

  Issue : April 2009
  DOI: DMKE042009001
  Title: High Dimensional Data Mining Using Clustering
  Authors: A. Bharathi, Dr. A.M. Natarajan
  Keywords: Data mining, High Dimensional Clustering, Distance Measure
  Abstract:
       Clustering is one of the major tasks in data mining. Clustering algorithms are based on a criterion that maximizes inter cluster distance and minimize intra cluster distance. In higher dimensional feature spaces, the performance and efficiency deteriorates to a greater extent. Large dimensions confuse the clustering algorithms and it is difficult to group similar data points becomes almost the same and is usually called as the “dimensionality curse” problem. These algorithms find a subset of dimensions by removing irrelevant and redundant dimensions on which clustering is performed. Dimensionality reduction technique such as Principal Component Analysis (PCA) is used for feature reduction. If different subsets of the points cluster well on different subspaces of the feature space, a global dimensionality reduction will fail. To overcome these problems, recent directions in research proposed to compute subspace cluster. The algorithms have two common limitations. First, they usually have problems with subspace clusters of different dimensionality. Second, they often fail to discover clusters of different shape and dimensionalities. The goal of this project is to develop new efficient and effective methods for high dimensional clustering.

Full PDF

  Issue : April 2009
  DOI: DMKE042009002
  Title: Using Context Transformations as a Pre-Processing Step in Mining Large Datasets
  Authors: Dr. B. Kalpana
  Keywords: Data mining, Context, Formal Foncepts
  Abstract:
       Data mining is being applied in several diverse areas such as market basket analysis, analysis of dependencies in biological sequences, search and extraction of information in the web, predicting trends in stock market and many others. In such applications, one of the methods of data analysis, that is gaining recognition is Formal Concept Analysis (FCA). The use of FCA in representing large datasets is particularly promising in reducing the time and storage representation . The characteristic that distinguishes FCA from other analysis methods is the absence of loss of information during the analysis of data. discusses two context transformations that do not change the structure of the concept lattice, namely context clarification and reduction. The objective is to explore the possibility of using such transformations as a preprocessing step so that, the dataset can be represented as a reduced context.

Full PDF

  Issue : April 2009
  DOI: DMKE042009003
  Title: SD-tree Based Indexing for Nested Object Query Processing
  Authors: Dr. I. Elizabeth Shanthi
  Keywords: Signature files, Indexing, OODB, Query Evaluation
  Abstract:
       Aiming at the fast retrieval of nested objects, we introduce a variation of the signature file based top-down hierarchy retrieval using an index structure called SD (Signature Declustering) tree. Signature files which were initially used on text data for their filtering capability have now been applied in Object Oriented Data Base Systems (OODBSs). Most of the proposed methods for Object Oriented query handling suffer from either longer retrieval time or comparison procedure complexity. This is mainly due to the poor filtering capability of the index structure in order to support complex query styles in OODBSs. In this paper we focus on the Object Oriented query handling of nested queries in the class hierarchy using an intermediate indexing structure called SD-tree that represents object signatures in a compact manner. Further it helps to retrieve all matching objects in a single access. We compare the performance of SD-tree based query processing with the signature tree based query processing reported recently. Our experimental analysis on large data sets shows that combined with query signature hierarchies SD-tree retrieves the matching objects quickly and therefore improves the time complexity of query evaluation substantially.

Full PDF

  Issue : April 2009
  DOI: DMKE042009004
  Title: An Implementation of FP-Growth Algorithm for Software Specification Mining
  Authors: R. Jeevarathinam, Dr. Antony Selvadoss Thanamani
  Keywords: Mining Specifications, Program Execution Traces, Apriori, FP_growth, Frequent Itemsets, Frequent Pattern.
  Abstract:
       Specification mining is a machine learning approach for discovering formal specifications of the protocols that code must obey when interacting with an application program interface or abstract data type. Two major concerns in engineering software systems are high maintenance costs and reliability of systems. To reduce maintenance efforts, there is a need for automated tools to help software developers understand their existing code base. So, there is a need to extract specifications to aid program comprehension. In this paper a novel technique to efficiently mine software specifications, called FP_TraceMiner is proposed which mines software specifications from program execution traces. The FP-growth algorithm is currently one of the fastest approaches. To address the limitations of Apriori-like methods, a mining paradigm has been proposed, which uses FP-growth algorithm which transforms a database into FP-tree stored in main memory and then performs mining on that optimized FP-tree structure.

Full PDF

  Issue : April 2009
  DOI: DMKE042009005
  Title: Discovery of Semantic Web Services Using Intelligent Predictions for Business Applications
  Authors: M.R. Sumalatha, P. Gowrishankar (Member, IEEE), B. Balamurali, R. Jayakandan
  Keywords: Ontology, Personalization, Web Services, RSS, Semantic Description, Mapping Ontologies, Event Prediction and Service Filtration
  Abstract:
       In the internet, web services are frequently used to perform a variety of task across several domains. The real problem with web services is finding out the service which suits to the user's needs and expectations. In traditional methods, deployment of web services using WSDL contextual information is not being given much importance. In the centralised web service, the context information of the user is used to list out the appropriate asset management services for business solutions. The web services have registered their service descriptions and these descriptions are being represented in OWL format. RSS feeds are used to analyze the current share market scenario and with the help of the past set of RSS information available, a prediction of the profitable asset management services are being structured and listed to the user. The user's contextual information helps in analyzing the user behaviour and hence a service is provided based on the user profile.

Full PDF

  Issue : April 2009
  DOI: DMKE042009006
  Title: Quality Depth-First Closed Itemsets (DCI_Closure) Associator
  Authors: Mr. Sakthi Ganesh.M, Dr. C. Kalairasan, R. Shalini Dr. V.D. Mytri
  Keywords: Depth-First, DCI Closure Associator Algorithm, Lattice Structure
  Abstract:
       The objective of this thesis work is to design an efficient Data Mining algorithm to extract the data efficiently from the transactional database. There are different algorithms available to mine the data from databases. We propose a new Data Mining Algorithm named DCI_CLOSURE ALGORITHM using Association rules for discovering closed frequent Itemsets. DCI_CLOSURE Algorithm is an extension of DCI_CLOSED Algorithm with Association Rules, Efficient Lattices and Hash Map. This algorithm adopts several optimization techniques to save the storage space as well as extraction time in computing itemset closures and their support value. The proposed algorithm, which unlike other previous proposals does not scan the whole data set. We are going to eliminate single Itemsets by the purpose we need only pair of items so we reduce the single itemset and calculate number of itemset through the formula 2n – (n+1).

Full PDF

  Issue : April 2009
  DOI: DMKE042009007
  Title: Software Tool for Agent Based Distributed Data Mining
  Authors: K. Anandakumar, Dr. M. Punithavalli
  Keywords: Data Mining, Frequent Item set, Distributed Data Mining
  Abstract:
       The main objective of this project is to illustrate the maximum utilization of available resources for the data mining activities. Mining information and knowledge from huge data sources such as Weather databases, financial data portals or emerging disease information systems has been recognized by industrial companies as an important area with an opportunity of major revenues from applications such as business data warehousing, process control, and personalized on-line customer services over Internet and web. Distributed Data mining is expected to perform partial analysis of data at clients and then to send the outcome as results to the server where it is sometimes required to be aggregated to the global result The primary issues to be considered for DDM are Scalability, privacy of data and autonomy of data. These issues can be easily handled when we go for intelligent software agents for Distributed Data mining, because of its inherent features of being autonomous, capable of adaptive and deliberative reasoning.

Full PDF

  Issue : April 2009
  DOI: DMKE042009008
  Title: Mining Frequent Itemsets using Temporal Association Rule
  Authors: M. Krishnamurthy, A. Kannan, R. Baskaran and S. Kanmanirajan
  Keywords: Frequent Item set, Calendar Schema, Temporal Association Rule Mining, Temporal Data Mining and Temporal Database
  Abstract:
       Association rule mining is to find association relationships among large data sets. Mining frequent patterns is an important aspect in association rule mining. Most of the popular associationship rule mining methods are having performance bottleneck for database with different characteristics of data such as dense vs. sparse. In this paper, an efficient algorithm named Temporal FP-Tree (Frequent Pattern - Tree) algorithm and the FP-tree structure is presented to mine frequent patterns, conditional pattern bases and sub- conditional pattern tree recursively .This algorithm is used to mine frequent patterns from temporal database and it needs limited memory space. When dataset becomes dense it can be scaled up to large database by partitioning it, conditionally temporal FP-tree can be constructed dynamically as part of mining.

Full PDF

  Issue : April 2009
  DOI: DMKE042009009
  Title: Image Clustering Techniques for the Exploration of Video Sequences
  Authors: Rekha B Venkatapur, Dr. V.D. Mytri, Dr. A. Damodaram
  Keywords: Information Retrieval, Image Retrieval, Clustering of Video Sequences, Video Segmentation
  Abstract:
       Digital video libraries are generating tremendous interest in pattern recognition, computer vision, and multimedia research communities. The amount of information currently available in internet and in proprietary databases is increasing every day. In the present study a systematic study is made for the exploration of video sequences. The system, GAMBAL-EVS, segments video sequences extracting an image for each shot and then clusters such images and presents them in a visualization system. The system permits to find similarities between images and to traverse along the video sequences to find the rellevant ones.

Full PDF

CiiT International Journal of Data Mining Knowledge Engineering Print: ISSN 0974 – 9683 & Online: ISSN 0974 – 9578

CiiT International Journal of Data Mining Knowledge Engineering
Print: ISSN 0974 – 9683 & Online: ISSN 0974 – 9578