20082009
2010 2011
2012 2013
April
May
June
July
September
October
November
Issue
: April 2009
DOI: AIML042009001
Title: An Improved Clustering Technique
Based On Statistical Model Preprocessing Using Gene
Expression Data
Authors: R. Mallika and G. Selvanayaki
Keywords: Clustering, Feature selection,
Gene expression
Abstract:
Micro arrays have become the effective, broadly
used tools in biological and medical research to address
a wide range of problems, including classification
of disease subtypes and tumors. Many statistical methods
are available for analyzing and systematizing these
complex data into meaningful information, and one
of the main goals in analyzing gene expression data
is the detection of samples or genes with similar
expression patterns. In this work, a comparison of
performance of several feature selection methods based
on data preprocessing including strategies of normalization
or data reduction is studied and a new classical statistic
technique is proposed for preprocessing. Then clustering
technique is applied and promising results were achieved.
The work also proves choice of a good preprocessing
technique prior to clustering improves the performance.
The results were proven to be the best in comparison
with previous work.
Full
PDF
Issue
: April 2009
DOI: AIML042009002
Title: An Image Spam Classification Model
Based on File Features Using Neural Networks
Authors: Ms. M. Soranamageswari, Dr. C.
Meena
Keywords: Back propagation, Image Spam,
Machine Learning and Spam Filtering
Abstract:
Spam is an unauthorized intrusion into a virtual
space, which caused serious economy loss and social
issues. Recently, Spammers have spreading new kind
of email spamming method called image spamming, which
uses simple image processing technologies like varied
borders or backgrounds, randomly varied spacing or
margins, and adding artifacts to the images. Priceless
effort, time, and money of the users and organizations
are wasted in handling them. Because of the recent
upsurge in image spam, the proposed system is developed
to classify image spam based on file features of an
image, rather than text contents by using Back propagation
neural networks, which classify the incoming image
as a spam or ham. The experimental result show the
system correctly classifies 95% of spam images with
minimum false positives.
Full
PDF
Issue
: April 2009
DOI: AIML042009003
Title: Automatic Tamil Document Categorization
Based on the Naive Bayes Algorithm
Authors: S. Kohilavani, T. Mala and T.
V. Geetha
Keywords: Document Categorization, Naïve
Bayes, Stopwords, Preprocessing, Classifier
Abstract:
This paper deals with automatic classification
of tamil documents. Documents are repositories of
knowledge. There are numerous documents available
and effective search in documents is time consuming.
To make document search a simpler task and for various
other applications like event detection and tracking,
document clustering and grouping we need to perform
document categorization. Document categorization is
a challenging task. Document categorization has recently
become an active research topic in the area of information
retrieval. The objective of document categorization
is to assign entries from a set of prespecified categories
to a document. Traditionally this categorization task
is performed manually by domain experts. Each incoming
document is read and comprehended by the expert and
then it is assigned to a number of categories chosen
from the set of prespecified categories. It is inevitable
that a large amount of manual effort is required.
A promising way to deal with this problem is to learn
a categorization scheme automatically from training
examples. In the training phase we are given a set
of documents with class labels attached, and a classification
system is built using a learning method. Once the
categorization scheme is learned, it can be used for
classifying future documents. Document category can
be found out using various techniques. In this paper,
Naive Bayes (NB) which is a statistical machine learning
algorithm, is used to classify tamil documents to
one of pre-defined categories. Experiments are used
to evaluate the Naive Bayes categorizer. The data
set used during these experiments consists of 50 documents
per category. The experimental results shows that
the Naive Bayes classifier performs well and its effectiveness
is achieved with 89.8% accuracy.
Full
PDF
Issue
: April 2009
DOI: AIML042009004
Title: Semi-Automatic Domain Ontology
Construction for Tamil Documents
Authors: M. S. Girija, T. Mala and T.
V. Geetha
Keywords: Ontology, Semi-automatic Ontology,
Semantic Relationship Extraction, Content Bearing
Words, TF-IDF, Morphological analysis and Clustering
Abstract:
Ontology is an explicit specification of a
conceptualization. That is, ontology is a description
of the concepts and relationships that can exist for
an agent or a community of agents. Ontology construction
is a challenging task and in this paper a new technique
is employed for the semi-automatic construction of
ontology. It involves two modules. They are ontological
word selection and semantic relationship extraction.
Ontological nodes and semantically related words are
selected from tamil text corpus. The input to the
system is the tamil text documents. Each and every
tamil text document is word segmented and then morphologically
analyzed to find out the parts of speech. This is
because, ontological words are supposed to be nouns.
The confinement of the noun list is performed using
TF-IDF technique. Semantically related words are identified
based on the notion of serial clustering of words
in text and by exploring the value of such clustering
as an indicator of a word’s bearing content. This
approach is flexible in the sense that is it is sensitive
to context. A term is assessed as content bearing
within one collection, but not another. In this way,
a domain ontology is constructed semi-automatically
for tamil text documents.
Full
PDF
Issue
: April 2009
DOI: AIML042009005
Title: Automatic Clustering and Normalised
Cut Based Image Retrieval Techniques
Authors: S. Vinodkumar, P.R. Lakshmi
Keywords: Back Propagation, CBIR, KCLUE
Abstract:
The KCLUster-based rEtrieval(KCLUE), groups
the image based on the similarity measure, so that
there is maximum similarity with in the cluster and
minimum similarity between the two cluster and then
retrieve the images related to the query. The cluster
based retrieval of images tackles the semantic gap
problem. The Content-Based Image Retrieval (CBIR)
extract the feature of the images and the images with
maximum similarity with that of the query is retrieved.
This paper makes use of both the concept to retrieve
the images. The CBIR system-using KCLUE is called
as Content-Based Image Clusters Retrieval (CBICR).The
keyword-based retrieval along with the CBIR system
retrieves the relevant images more effectively and
it consumes less amount of time. The keyword based
retrieval is done and the Nearest Neighbor Method
is used to locate neighbor of the target image. The
N-cut algorithm is used to organize the cluster.
Full
PDF
|