bringing the world locally
About us
Subscription
Review Committee
Conferences
Publications
FAQ
Contact
 
Members Login
    You are not logged in.
Username


Password



CiiT International Journal of Data Mining Knowledge Engineering
Print: ISSN 0974 9683 & Online: ISSN 0974 9578

20082009 2010 2011 2012 2013
  January February March April May June July August September October November December


Issue: January 2012
DOI: DMKE012012001
Title: Data Mining and Knowledge Discovery in the Organization
Authors: Dr.C. Muthuvelayutham and T. Sugantha Lakshmi
Keywords: Data Mining, Customer Relationship Management (CRM), Knowledge Discovery and Data Mining (KDDM)
Abstract:
     This paper is about the Data. Information and knowledge concepts and the role of extracted information for the organization welfare. Aim is to describes that top level management, Middle level and lower level management takes the help of information for the decision making. Business understanding, data understanding and preparation, modeling, evaluation, and deployment are the main methodology used in analysis. Adopting data mining and warehousing technique will provide good turn over with the overall accuracy in the business. The concept and process of Knowledge Discovery is also discussed. The role of knowledge discovery in the organization is also covered in this paper.

Full PDF


Issue: January 2012
DOI: DMKE012012002
Title: An Enhanced Method for Efficient Information Retrieval from Resume Documents using SPARQL
Authors: P. Sheba Alice, A.M. Abirami and Dr. A. Askarunisa
Keywords: RDF, OWL, SPARQL, Document Filter, Information Retrieval
Abstract:
      It is more important to retrieve information from various types of documents like DOC, HTML, etc that contain vital information to be preserved and used in future. Information retrieval from these documents is mostly the manual effort. Though search algorithms do this retrieval, they may not be accurate as expected by the user. Also, some documents like candidates’ resumes cannot be stored into the relational database as such because the number of fields is more. Much of manual efforts are put in use to analyze the various resumes to select the candidates who satisfy the specific criteria. To minimize the manual efforts and to get the results faster, this paper proposes the use of Semantic Web Technology like OWL, RDF and SPARQL to retrieve the information from the documents efficiently. This paper proposes to create the Ontology for the required domain as a first step. Based on the fields or tags in the owl file, the user is given a form to provide his personal and academic details. These data is converted into RDF/XML document. RDF files are retrieved and grouped based on some category. Query text is entered and the relevant records are retrieved from RDF documents using SPARQL. SPARQL is an RDF query language that enhances fast and efficient search of data when compared to other XML query languages like XPATH and XQUERY. Comparison between SPARQL and XPATH in terms of time taken to retrieve records is also analyzed in this paper.

Full PDF


Issue: January 2012
DOI: DMKE012012003
Title: An Intelligent Search Engine for Extracting Documents Relevant to Poorly Defined Criteria
Authors: Magda B. Fayek, Hatem M. El-Boghdadi and Mohamed A. Gawad
Keywords: Boolean IR Model, IR Evaluation, Relevance Feedback, Recall-Precision Measure, Vector Space Model
Abstract:
     Information retrieval (IR) deals with the representation, storage, organization and access to information items. Often users’ queries to search engines are not well formulated and hence donot express what the user is searching for exactly. Such poorly defined criteria result in the retrieval of documents that donot exactly meet user expectations. Many attempts have been made for refining document retrieval based on interaction with user. Mostly, those attempts provide the user with functionalities for editing queries and marking documents. To many users these functionalities are too complicated and hence users hardly use them.

In this paper we present an intelligent search engine that targets those poorly defined queries and interactively helps users fine tune their search. The user merely specifies those documents among initially retrieved documents that are most relevant to his request. Then the system makes use of users’ relevance feedback in response to initial search results and automatically updates the search criteria initially submitted by the user. The search results are then updated to improve the  selection of documents retrieved. The system adopts RBIR (Ranked Boolean IR), which is a modified Boolean model that estimates document relevance using keyword weights to rank search results. Its accuracy is comparable with Vector Space, while keeping processing overhead low.

Results show that a remarkable improvement in precision is achieved already at the first iteration after relevance feedback, especially at very poor criteria and low recall. As recall rate increases the improvement in precision drops, however improvement remains even at a recall rate of 100%. Generally, the average performance of RBIR with relevance feedback is always better than vector space and RBIR. The average improvement ranges between 12% and 60% relative to vector space and 32% and 25% relative to RBIR at low recall rates. As queries become less definitive the enhancement is more profound.

Full PDF


Issue: January 2012
DOI: DMKE012012004
Title: Multiscale Segmentation for Mixed Raster Content  Applicable to Document Coding
Authors: S. Amutha and V. Ponraj
Keywords: Muliscale Image Analysis, Mixed Raster Content, Document Image Segmentation, MRC Compression, Markov Random Fields, Document Coding
Abstract:
     Compound document images contain graphic or textual content along with pictures. They are found in magazines, brochures, web-sites, etc in a document format. The goal is to compress an image containing the mixed raster content (MRC) using multi-layer approach. The proposed methodology segments the image into regions such as text, pictures and background. The key to MRC compression is the separation of the document into foreground and background layers, represented as a binary mask. The compression quality depends on the segmentation algorithm used to compute the binary mask.

The proposed multi-scale segmentation algorithm models the complex aspects of both local and global contextual behavior. The proposed algorithm finds the block-wise segmentation of the raster image in a global cost optimization framework. Then the initial segmentation is refined by classifying feature vectors of connected components using a Markov random field (MRF) model. Then hybrid procedures of the previous steps are then incorporated into a multi-scale framework in order to improve the segmentation accuracy of text with varying size. It is shown that the proposed methodology achieves greater accuracy of text detection but with a lower false detection rate of non-text features. This segmentation algorithm can improve the quality of decoded documents while simultaneously lowering the bit rate. It is also shown that execution time can be greatly reduced by the use of features that are not computationally intensive.

Full PDF


Issue: January 2012
DOI: DMKE012012005
Title: Efficient Query Result Navigation Using Top down Navigation Model
Authors: R. Saranya and B. Arunkumar
Keywords: Information Retrieval, Navigation, Search Process
Abstract:
    Search queries on databases, and often it return a large number of results, only a small subset of result are relevant to the user. Ranking and categorization, which can also be combined, have been proposed to alleviate this information overload problem. Results categorization for databases is the focus of this work. In this paper, we present the system is a novel search interface that enables the user to navigate large number of query results by organizing them using the concept hierarchy. First, the query results are organized navigation tree. Inside the navigation tree edge cut operation is performed. The query results returns the two set of results that is relevant to the user and ignore results. At each node expansion step, this system of results reveals only a small subset of the concept nodes, selected such that the expected user navigation cost is minimized. In contrast, previous works expand the hierarchy in a predefined static manner, without navigation cost modelling. We show that the problem of selecting the best concepts to reveal at each node expansion and propose an efficient heuristic as well as a feasible optimal algorithm for relatively small trees.

Full PDF


Issue: January 2012
DOI: DMKE012012006
Title: Association Rule - Spatial Data Mining Approach for Geo-Referenced in Crime to Crime Analysis
Authors: A. Thangavelu, S.R. Sathyaraj, R. Sridhar and S. Balasubramanian
Keywords: Algorithm, Association Rule, Data Mining, Crime Data, GIS
Abstract:

     Spatial data mining is a demanding field since huge amounts of spatial data that has been processed and turned into useful information by this paper. The increased crime rate and enormous amount of data being stored in crime databases by police personnel which has been collected from various jurisdiction of Coimbatore are gathered for the application of technologies which provides the means to turn data into information by data fusion and data mining. Data fusion organizes, combines and interprets information from multiple sources and it overcomes confusion from conflicting reports and cluttered or noisy backgrounds. Data mining is concerned with the automatic discovery of patterns and relationships with (crime to crime) in large databases. Technically, it is the process of finding correlations or patterns among dozens of fields in large relational databases using the tools of GIS. This paper provides a clear finding to prevent from crime with associated to another crime occurrence with the naked observation on correlation between one crime to another crime.

Full PDF


Issue: January 2012
DOI: DMKE012012007
Title: Analysis of the Depth First Search Algorithms
Authors: Navneet Kaur and Deepak Garg
Keywords: RHS Algorithm, Parallel Formulation, DFS, Work Distribution Schemes, Directed Acyclic Graphs, MIMD Architecture.
Abstract:

    When the traditional Depth First Search(DFS) algorithm is used for searching an element  in the Directed Acyclic Graphs (DAGs),then a lot of time is wasted in the back-tracking .But this paper discusses the Reverse Hierarchical Search (RHS) algorithm .For the DAG tree  structure the RHS algorithm provides the better performance by avoiding the unnecessary search .This paper also presents a parallel formulation of the Depth First Search which retains the storage efficiency of the sequential depth first search and can be mapped on to any MIMD architecture. In this paper we have tried to improve the searching of any node in the DFS by combining the features of both the RHS algorithm and the parallel formulation of the DFS which helps in maintaining the storage efficiency and reduce the search which is unnecessary. In the RHS algorithm we use the previous node information (so that the duplicity can be prevented) to find the next nodes for searching. The main features that affect the parallel formulation is the dynamic work distribution technique which divides the work between different processors .The performance of the parallel formulation is strongly affected by the technique of work distribution and various features of the architecture such as presence or absence of shared memory, relative speed of the communication network. When we combine the features of both RHS algorithm and parallel formulation of DFS, it gives good performance.

Full PDF


Issue: January 2012
DOI: DMKE012012008
Title: Key Word Based Word Sense Extraction in a Index for Text Files: Design Approach
Authors: Shahana Bano, Dr.K. Raja Sekhara Rao and M. Sai Sandeep
Keywords: Context, Processing, Linguistic, Unstructured, Stop Words, Text, File
Abstract:

     Data stored in most text documents are semi structured data and they are neither completely unstructured nor completely structured. In many cases it is often difficult to identify the word context. This paper aims at processing of text in a file in a efficient manner by removing stop words. The method presented is an effective way of processing of word so that all the details of a particular word such as context, linguistic details etc., are found.

Full PDF


Issue: January 2012
DOI: DMKE012012009
Title: An Analysis on Beta Thalassemia Major Patients through the Techniques of Data Clustering
Authors: P.D. Siji and Dr. Vasantha Kalyani David
Keywords: Beta Thalassemia, Clustering, Fuzzy C Means, K-Means
Abstract:

    In data mining, clustering analysis is a technique for grouping data into related component based on similarity metrics. Integration of fuzzy logic with data mining techniques has become one of the key constituents of soft computing. The k means Algorithm is the best method to cluster the crisp data. In traditional clustering algorithm, one object is assigned in to only one cluster. This is valid till the clusters are disjoint and separate. But if the clusters are touching each other or they are overlapping, then one object can belong to more than one cluster. In this case fuzzy clustering comes in to existence. In this paper the grouping of beta thalassemia major disease is taken as a case study. Thalassemia can lead to severe transfusion-dependent anaemia, and it is the most common genetic disorder in all part of the world especially Countries in the Middle East. Fuzzy c means algorithms is applied for the clustering to the database and the result is discussed in this paper.

Full PDF


Issue: January 2012
DOI: DMKE012012010
Title: A Study on the Relationships between Emotional Quotients (EI), Stress and Multimedia 3D Animations
Authors: B. Senthil Kumar
Keywords:Multimedia Learning Material, Retention & Recollection, Emotional Quotient & Intelligence, Stress Level, Cognitive Aspects
Abstract:

    This paper presents the experiments and makes a research on Effective use of rich 3D animation multimedia learning material among the BE Information Technology students. An experiment has been conducted to study the relationship between multimedia learning materials’ impact on the Retention and Recollection ability among the students. The purpose of the research reported in this paper was to find out the use of 3D animation for developing an effective learning material. The ultimate objective of this study is to bring out the students’ cognitive aspects such as emotional intelligence, stress level, recollection level and retention level due to the presence of 3D animation in a multimedia learning material. We have investigated the multimedia learning materials’ impact among the students inner state. We have found that there are relationships exits between students’ emotional intelligence, stress level and learning materials’ various attributes like Colour, Sound, Text, images and 3D animation.

Full PDF