Open Access Open Access  Restricted Access Subscription or Fee Access

Mixture Weighted Latent Dirichlet allocation, an Optimized and Generalized Probabilistic Model for Large Corpus of Data

Muzafar Rasool Bhat, M. Arif Wani

Abstract


This study introduces a new distribution named as Mixture Weighted Dirichlet Distribution (MWDD) which acts as generalization to other distributions viz. Dirichlet Distribution, Length or Size-Biased Dirichlet, Area Biased and Volume Biased Dirichlet distributions. Aim of this research is to introduce and implement MWDD for Probabilistic Topic Modeling of Corpus of textual data as an optimized as well as generalized probabilistic topic model. Various statistical and structural properties which include moments (1st, 2nd and rth) moment about origin, Variance and Standard deviation of the proposed distribution are thoroughly studied. Cora, a dataset of 2410 scientific documents in LDA format conforming to probabilistic topic modeling experimentations has been used in this study.  Generative process of new probabilistic topic modeling technique using MWDD is also elaborated in this manuscript. For comparing the efficiency of existing as well as various special cases of the proposed model, AIC’s, AICC’s, BIC’s and log likelihood measures have been utilized.  This study concludes that special case of the proposed model namely Mixture Volume Biased Weighted Dirichlet distribution (MVBWDD) is efficient as it has least AIC, AICC and BIC values with p, a mixture parameter varying from 0 to 1 on probabilistic topic modeling of the Dataset.


Keywords


Dirichlet Distribution; LDA; Weighted Dirichlet Distribution (WLDA); Probablistic Topic Modelling; Mixture Weighted Dirichlet Disribution (MWDD); MWDD Topic Model.

Full Text:

PDF

References


. Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Continuous multivariate distributions, volume 1, models and applications. Vol. 59. New York: John Wiley & Sons, 2002.

. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine learning research 3.Jan (2003): 993-1022.

. Dahl, David B. "Model-based clustering for expression data via a Dirichlet process mixture model." Bayesian inference for gene expression and proteomics (2006): 201-218.

. Ishwaran, Hemant, and Lancelot F. James. "Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information."Journal of Computational and Graphical Statistics. 2000.

. Wani, M. Arif, and Bruce G. Batchelor. "Edge-region-based segmentation of range images." IEEE Transactions on Pattern Analysis and Machine Intelligence 16.3 (1994): 314-319.

. Wani, M. Arif, and Bruce G. Batchelor. "Heuristic segmentation of range images." Intelligent Robots and Computer Vision X: Algorithms and Techniques. Vol. 1607. International Society for Optics and Photonics, 1992.

. Wani, M. Arif, and Bruce G. Batchelor. "Two-dimensional boundary inspection using autoregressive model." High-Speed Inspection Architectures, Barcoding, and Character Recognition. Vol. 1384. International Society for Optics and Photonics, 1991.

. Bhat, Farooq Ahmad, and M. Arif Wani. "Performance Comparison of Major Classical Face Recognition Techniques." Machine Learning and Applications (ICMLA), 2014 13th International Conference on. IEEE, 2014.

. Bhat, Muzafar Rasool, and M. Arif Wani. "Evaluating Algebraic Model Based Information Retrieval Algorithms for Small and Large Data set. " Computing for Sustainable Global Development (INDIACom), 2017 4th International Conference on. IEEE, 2017.

. Bhat, Muzafar Rasool, and M. Arif Wani. "Selecting Appropriate Number of Singular Values for Latent Semantic Indexing in Information Retrieval."Recent Trends and Advancements in Engineering and Technology 2016, 4th international conference on. ICRTAET 2016.

. Bhat, Farooq Ahmad, and Mohd Arif Wani. "Improved Face Recognition Algorithm Using Eigen Faces." International Journal 3.12 (2013).

. Bhat, H. F., & Wani, M. A. (2017). Algorithms for Sequence Alignment. 4th International Conference on “Computing for Sustainable Global Development”, (BVICAM). ISSN 0973-7529; ISBN 978-93-80544-24-3

. Bhat, H. F., & Wani, M. A. (2013). Modified one-against-all algorithm based on support vector machine. Int. J. Adv. Res. Comput. Sci. Softw. Eng, 3, 12. Volume 3, Issue 12, ISSN: 2277 128X.

. Khan, Asif Iqbal, and Mohd Arif Wani. "Latent Fingerprints Classification Using Transfer Learning". Artificial Intelligent Systems and Machine Learning, 2017.

. M. Arif Wani, and Saduf Afzal. "Gain Parameter and Dropout Based Fine Tuning of Deep Networks." Machine Learning and Applications (ICMLA), 2017 16th International Conference on .pp. 359-363, IEEE, 2017.

. Wani, M.A., and Afzal, S. (2017) ‘A New Framework for Fine Tuning of Deep Networks’, 16th IEEE International Conference on Machine Learning and Applications, pp. 359-363.

. Wani, M. A. (2008) ‘Incremental hybrid approach for microarray classification’, Proceedings of the Seventh International Conference on Machine Learning and Applications, pp. 514-520.

. Wani, M. A. (2011) ‘Microarray classification using sub-space grids’, Proceedings of the Tenth International Conference on Machine Learning and Applications, Vol. 1, pp, 389-394.

. Wani, M. A. (2012) ‘Introducing subspace grids to recognise patterns in multidimensinal data’, International Conference on Machine Learning and Applications, Vol. 1, pp. 33-39.

. Wani, M. A., and Yesilbudak, M. (2013) ‘Recognition of wind speed patterns using multi-scale subspace grids with decision trees’, International Journal of Renewable Energy Research (IJRER), Vol. 3 No. 2, pp. 458-462.

. Wani, M. A., and Riyaz, R. (2016) ‘A new cluster validity index using maximum cluster spread based compactness measure’, International Journal of Intelligent Computing and Cybernetics, Vol. 9 No. 2, pp. 179-204.

. Wani, M. R., Wani, M. A., and Riyaz, R. (2016) ‘Cluster based approach for mining patterns to predict wind speed’, International Conference on Renewable Energy and Applications, pp. 1046-1050.

. Riyaz, R., and Wani, M. A. (2016) ‘Local and Global Data Spread Based Index for Determining Number of Clusters in a Dataset’, International Conference on Machine Learning and Applications pp. 651-656.

. Wani, M. A., and Riyaz, R. (2017) ‘A novel point density based validity index for clustering gene expression datasets’, International Journal of Data Mining and Bioinformatics, Vol. 17 No. 1, pp. 66-84.

. Wani, M. A. (2001) ‘SAFARI: a structured approach for automatic rule’, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 31 No. 4, pp. 650-657.

. Fisher, R. A. "The effect of methods of ascertainment upon the estimation of frequencies." Annals of eugenics 6.1 (1934): 13-25.

. Rao, C. Radhakrishna. "On discrete distributions arising out of methods of ascertainment." Sankhyā: The Indian Journal of Statistics, Series A (1965): 311-324.

. Shaban, S. A., and Naima Ahmed Boudrissa. "The weibull length biased distribution properties and estimation." MSC Primary: 60E05, secondary: 62E15, 62F10 and 62F15 (2000).

. Das, Kishore K., and Tanusree Deb Roy. "On some length-biased weighted Weibull distribution." Pelagia Research Library, Advances in Applied Science Research 2.5 (2011): 465-475.

. Oluyede, Broderick O., and E. Olusegun George. "On stochastic inequalities and comparisons of reliability measures for weighted distributions."Mathematical problems in Engineering 8.1 (2002): 1-13.

. Ghitany, M. E., and D. K. Al-Mutairi. "Size-biased Poisson-Lindley distribution and its application." Metron-International Journal of Statistics66.3 (2008): 299-311.

. Oluyede, Broderick O., and Mekki Terbeche. "On energy and expected uncertainty measures in weighted distributions." International Mathematical Forum. Vol. 2. No. 20. 2007.

. Reshi, J. A., and A. Ahmed. "Characterization, Reliability and Information measures of Even-Power Weighted Generalized Gamma Distribution." (2015).

. Reshi, J. A., and A. Ahmed. "Characterization and Estimations of Weighted Generalized Beta Probability Distributions." Journal of Statistics Applications & Probability 4.3 (2015): 513.

. Reshi. J. A, Ahmed.A and Mir. K.A. (2014a). “On New Moment Method of Estimation of Parameters of Size-biased Classical Gamma Distribution and its Characterization.” International Journal of Modern Mathematical Sciences, 10(2): p. 179-190.”

. Reshi. J. A, Ahmed.A and Mir. K.A. (2014b). “Some important Statistical properties, Information measures and Estimations of Size biased Generalized Gamma Distribution.” Journal of Reliability and Statistical Studies. Volume 7 (2), pp. 161-179.

. Kollu, Ravindra, et al. "Mixture probability distribution functions to model wind speed distributions." International Journal of Energy and Environmental Engineering 3.1 (2012): 27.


Refbacks

  • There are currently no refbacks.