Open Access Open Access  Restricted Access Subscription or Fee Access

Training and Model Structure of Deep Architectures

Saduf Afzal, M. Arif Wani

Abstract


Deep architectures with several layers of processing elements have become highly successful and emerging research topic. These models learn representations of the input data by combining representation learned at lower level, with the objective of yielding more useful and more abstract representations at the higher level. Deep architectures have achieved promising results on many speech, vision and natural language processing tasks. In this paper, we provide a review of various strategies that are used in the training of deep neural networks.  Besides, we discuss some of the commonly used deep network models, with detailed explanation of their training procedure and the recent progress. 


Keywords


Autoencoders, deep learning, deep neural networks, restricted Boltzmann machine, deep belief networks

Full Text:

PDF

References


G. H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, “An empirical evaluation of deep architectures on problems with many factors of variation”, in ICML, 2007, pp. 473–480.

D. Erhan, P.A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, “The difficulty of training deep architectures and the effect of unsupervised pre-training, ” in AISTATS, 2009, pp. 153–160.

G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks”, Science, 2006, pp. 504–507.

H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, “Exploring Strategies for Training Deep Neural Networks,” J. Machine Learning Research, vol. 10, 2009, pp. 1-40.

D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio, “Why Does Unsupervised Pre-Training Help Deep Learning?” J. Machine Learning Research, vol. 11, 2010, pp. 625-660.

R. Salakhutdinov and G.E. Hinton, “Deep Boltzmann Machines,” Proc. Conf. Artificial Intelligence and Statistics, 2009, pp. 448-455.

I. Goodfellow, Q. Le, A. Saxe, and A. Ng, “Measuring Invariances in Deep Networks,” Proc. Neural Information and Processing System, 2009, pp. 646-654.

G.E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief Nets”, Neural Computation, 18, 2006, pp. 1527–1554.

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy Layer-Wise Training of Deep Networks,” Proc. Neural Information and Processing Systems, 2007.

D. Yu, S. Wang, and L. Deng, “Sequential Labeling Using Deep- Structured Conditional Random Fields,” IEEE J. Selected Topics in Signal Processing, vol. 4, 2010, pp. 965-973.

F. Seide, G. Li, and D. Yu, “Feature Engineering in Context- Dependent Deep Neural Networks for Conversational Speech Transcription,” Proc. IEEE Workshop Automatic Speech Recognition and Understanding, 2011.

T.N. Sainath, B. Kingsbury, and B. Ramabhadran, “Improving training time of deep belief networks through hybrid pre-training and larger batch sizes”, in Proc. NIPS Workshop on Log-linear Models, 2012.

I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning”, in International conference on machine learning, 2013, pp. 1139-1147.

Y. Nesterov, “A method of solving a convex programming problem with convergence rate O(1/sqr(k))”, Soviet Mathematics Doklady, 1983, pp. 372–376.

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks”, in Proceedings of AISTATS 2010, vol 9, 2010, pp. 249–256.

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification”, in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026-1034.

A.M. Saxe, J.L. McClelland, and S. Ganguli, “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks”, arXiv preprint arXiv:1312.6120, 2013.

D. Sussillo and L.F. Abbott, “Random walk initialization for training very deep feedforward networks”, arXiv preprint arXiv:1412.6558, 2014.

Y. N. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio, “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization”, In Advances in neural information processing systems, 2014, pp. 2933-2941.

J. Martens, “Deep learning via Hessian-free optimization”, in Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.

J. Martens and I. Sutskever, “Learning recurrent neural networks with hessian-free optimization”, in Proceedings of the 28th International Conference on Machine Learning (ICML), 2011, pp. 1033–1040.

O. Chapelle and D. Erhan, “Improved Preconditioner for Hessian Free Optimization”, in NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier networks”, In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323.

I.J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, “Maxout networks”, arXiv preprint arXiv:1302.4389, 2013.

R. K. Srivastava, J. Masci, S. Kazerounian, F. Gomez, and J. Schmidhuber, “Compete to compute”, in Advances in neural information processing systems, 2013, pp. 2310-2318.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting”, The Journal of Machine Learning Research, vol. 15(1), 2014, pp. 1929-1958.

P. Lennie, “The cost of cortical computation”, Current Biology, 13, pp. 493-497.

M. Ranzato, Y. Boureau, and Y. LeCun, “Sparse feature learning for deep belief networks”, In Nips, 2008.

J. Li, H. Chang, and J. Yang, “Sparse deep stacking network for image classification,” in Proc. AAAI Conf. Artif. Intell., Austin, TX, USA, 2015, pp. 3804–3810.

C. Szegedy et al., "Going deeper with convolutions, CoRR abs/1409.4842." URL http://arxiv. org/abs/1409.4842, 2014.

C. Gulcehre and Y. Bengio, “Knowledge matters: Importance of prior information for optimization”, in ICLR, 2013.

A. Romero, N. Ballas, S.E. Kahou, A. Chassang, C. Gatta and Y. Bengio, “Fitnets: Hints for thin deep nets”, arXiv preprint arXiv:1412.6550, 2014

P. Baldi and P. Sadowski, “The dropout learning algorithm”, Artificial intelligence, 210, 2014, pp. 78-122.

L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect”, in International Conference on Machine Learning, 2013, pp. 1058-1066.

R. Wu, S. Yan, Y. Shan, and et al., Deep image: scaling up image recognition, arXiv preprint, arXiv: 1501.02876, 2015.

G.E. Hinton and T.J. Sejnowski, “Learning and Relearning in Boltzmann Machines”, MIT Press, Cambridge, vol. 1, 1986.

G.E. Hinton, S. Osindero, and Y.W. Teh, “A fast learning algorithm for deep belief nets, Neural Comput.”, vol. 18 (7), 2006, pp. 1527–1554.

G. Hinton, “A practical guide to training restricted Boltzmann machines”, Momentum, Vol. 9 (1), 2010.

G.E. Hinton, “Training products of experts by minimizing contrastive divergence”, Neural Comput. Vol. 14 (8), 2002, pp. 1771–1800.

T. Tieleman, “Training restricted Boltzmann machines using approximations to the likelihood gradient”, in: Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 1064–1071.

T. Tieleman and G. E. Hinton, “Using fast weights to improve persistent contrastive divergence”, In A. Pohoreckyj Danyluk, L. Bottou, and M. L. Littman, editors, Proceedings of the 26th International Conference on Machine Learning (ICML), 2009, pp. 1033–1040.

R. Salakhutdinov, “Learning in Markov random fields using tempered transitions”, in: Proceedings of Advances in Neural Information Processing Systems, 2009, pp. 1598–1606.

G. Desjardins, A. Courville, Y. Bengio, P. Vincent and O. Delalleau, “Parallel tempering for training of restricted boltzmann machines”, in: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010, pp. 145–152.

K. H. Cho, T. Raiko, and A. Ilin, “Parallel tempering is efficient for learning restricted boltzmann machines”, in: Proceedings of the 2010 International Joint Conference on Neural Networks, 2010, pp. 1–8.

K. Brügge, A. Fischer and C. Igel, “The flip-the-state transition operator for restricted Boltzmann machines”, Machine Learn, vol. 93 (1), 2013, pp 53–69.

S. Osindero and G. E. Hinton, “Modeling image patches with a directed hierarchy of markov random fields,” in Advances in Neural Information Processing Systems, 2008, pp. 1121–1128.

I. Sutskever and G. E. Hinton, “Learning multilevel distributed representations for high-dimensional sequences,” in International Conference on Artificial Intelligence and Statistics, 2007, pp. 548–555.

I. Sutskever, G. E. Hinton, T.W. Graham, “The recurrent temporal restricted Boltzmann machine”, in NIPS, 2008.

R. Mittelman, B. Kuipers, S. Savarese and H. Lee, “Structured recurrent temporal restricted boltzmann machines”, in International Conference on Machine Learning , 2014, pp. 1647-1655.

V. Mnih, H. Larochelle, and G. E. Hinton, “Conditional restricted boltzmann machines for structured output prediction,” arXiv preprint arXiv:1202.3748, 2012.

C. Chen, C.-Y. Zhang, L. Chen, and M. Gan, “Fuzzy restricted boltzmann machine for the enhancement of deep learning,” Fuzzy Systems, IEEE Transactions on, vol. 23(6), 2015, pp. 2163–2173.

H. Chen and A.F. Murray, “Continuous restricted Boltzmann machine with an implementable training algorithm”, IEEE Proceedings of Vision, Image and Signal Processing, vol. 150(3), 2003, pp. 153–158.

M. Welling, M. Rosen-Zvi, and G.E. Hinton, “Exponential family harmoniums with an application to information retrieval”, in Advances in Neural Information Processing Systems (NIPS), 2005, pp. 1481–1488.

M. Welling, S. Osindero and G.E. Hinton, “Learning sparse topographic representations with products of student-t distributions”, in Advances in Neural Information Processing Systems (NIPS), 2003, pp. 1359–1366.

G.E. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks”, Science, vol. 28, 2006, pp. 504–507.

H. Lee, R. Grosse, R. Ranganath and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, in International Conference on Machine Learning (ICML), 2009, pp, 609–616.

M. Ranzato, G. E. Hinton, “Modeling pixel means and covariances using factorized third-order Boltzmann machines”, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2551–2558.

G. Dahl, M. Ranzato, A. rahman Mohamed and G. E. Hinton, “Phone recognition with the mean-covariance restricted Boltzmann machine”, in Advances in Neural Information Processing Systems (NIPS), 2010, pp. 469–477.

R. Memisevic and G. E. Hinton, “Unsupervised learning of image transformations”, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007.

A. Courville, J. Bergstra and Y. Bengio, “The spike and slab restricted Boltzmann machine”, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2011, pp. 233–241.

G.W. Taylor and G.E. Hinton, “Factored conditional restricted Boltzmann machines for modeling motion style”, in International Conference on Machine Learning (ICML), 2009, pp. 1025–1032.

R. Memisevic and G.E. Hinton, “Learning to represent spatial transformations with factored higher-order Boltzmann machines”, Neural Computation, vol. 22(6), 2010, pp.1473–92.

H. Larochelle and Y. Bengio, “Classification using discriminative restricted Boltzmann machines”, in International Conference on Machine Learning (ICML), 2008, pp. 536–543.

Y. Tang, R. Salakhutdinov and G.E. Hinton, “Robust boltzmann machines for recognition and denoising”, in: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2264–2271.

G. Li, L. Deng, Y. Xu, C. Wen, W. Wang, J. Pei and L. Shi, “Temperature based restricted Boltzmann machines”, Sci. Rep. vol. 6 (19133), 2016.

M. Arif Wani and S. Afzal, “A New Framework for Fine Tuning of Deep Networks”, 16th IEEE International Conference on Machine Learning and Applications, 2017, pp. 359-363.

S. Afzal and M. Arif Wani, "Improving Performance of Deep Networks on Handwritten Digit Classification”, Computing for Sustainable Global Development (INDIACom), 2017 4th International Conference on., pp. 4238-4241, IEEE, 2017.

H. Lee, C. Ekanadham and A.Y. Ng, “Sparse deep belief net model for visual area V2”, in: Proceedings of the NIPS, 2008.

J. Xie, H. Lu, D. Nan, and C. Nengbin, “Sparse deep belief net for handwritten digits classification”, Artificial Intelligence and Computational Intelligence, 2010, pp. 71-78.

V. Nair and G.E. Hinton, “3D object recognition with deep belief nets, Adv. Neural Inf. Process. Syst.”, pp. 1339–1347, 2009.

Y. Tang and C. Eliasmith, “Deep networks for robust visual recognition”, in: Proceedings of the ICML, 2010.

L. Pape, F. Gomez, M. Ring and J. Schmidhuber, “Modular deep belief networks that do not forget”, in Neural Networks (IJCNN), The 2011 International Joint Conference on, 2011, pp. 1191-1198.

S.N. Gowda, “Human activity recognition using combinatorial Deep Belief Networks”, in Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, 2017, pp. 1589-1594.

Y. Tang and A. R. Mohamed, “Multiresolution deep belief networks”, in Artificial Intelligence and Statistics, 2012, pp. 1203-1211.

G.E. Hinton and R.R. Salakhutdinov, “Reducing the dimensionality of data with neural networks”, Science , vol.313 (5786), 2006, pp. 504–507.

P. Vincent, “A connection between score matching and denoising autoencoders”, Neural Comput., vol. 23 (7), 2011, pp. 1661–1674.

P. Vincent, H. Larochelle, Y. Bengio and P.A. Manzagol, “Extracting and composing robust features with denoising autoencoders”, in: Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 1096–1103.

Z.H. Ling, S.Y. Kang, H. Zen, A. Senior, M. Schuster, X.J. Qian, H.M. Meng and L. Deng, “Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends”, IEEE Signal Process. Mag., vol. 32 (3), 2015, pp. 35–52.

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P.A. Manzagol,“ Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion” , J. Mach. Learn. Res., vol. 11, 2010, pp. 3371–3408.

C. Poultney, S. Chopra, and Y. L. Cun, “Efficient learning of sparse representations with an energy-based model”, Adv. Neural Inf. Process. Syst., 2006, pp. 1137–1144.

A. Makhzani and B. Frey, “K.-sparse autoencoders”, (arXiv:1312.5663), 2013.

S. Rifai, P. Vincent, X. Muller, X. Glorot and Y. Bengio, “Contractive auto-encoders: Explicit invariance during feature extraction”, in: Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 833–840.

M. Sun, X. Zhang and T.F. Zheng, “Unseen noise estimation using separable deep auto encoder for speech enhancement”, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24 (1), 2016, pp. 93–104.

B. Chandra and R.K. Sharma, “Fast learning in deep neural networks”, Neurocomputing, vol. 171, 2016, pp. 1205–1215.

S. Afzal and M. Arif Wani, “Algorithms for Optimized Training of Artificial Neural Networks”, International Journal of Innovative and Emerging Research in Engineering, Vol. 4(8), 2017, pp. 24-33.

Saduf and M. Arif Wani, “Comparative Study of Back Propagation Learning Algorithms for Neural Networks”, International Journal of Advanced Research in Computer Science and Software Engineering, vol 3(12), pp. 1151-1156, 2013.

Saduf and M. Arif Wani, “Comparative Study of High Speed Back-Propagation Learning Algorithms”, International Journal of Modern Education and Computer Science, 6(12), 2014.

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S Lew, “Deep learning for visual understanding: A review”, Neurocomputing, vol. 187, pp. 27-48, 2016.

W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu and F.E. Alsaadi, “A survey of deep neural network architectures and their applications”, Neurocomputing, vol. 234, pp. 11-26.

D. Eigen, J. Rolfe, R. Fergus and Y. LeCun, “Understanding deep architectures using a recursive convolutional network”, arXiv:1312.1847, 2013.

K. Jarrett, K. Kavukcuoglu, M. Ranzato and Y. LeCun, “What is the best multi-stage architecture for object recognition?”, in: Proceedings of IEEE Proceedings of the 12th International Conference on Computer Vision, 2009, pp. 2146–2153.

J. Masci, U. Meier, D. Cireşan and J. Schmidhuber, Stacked convolutional autoencoders for hierarchical feature extraction, Artif. Neural Netw. Mach. Learn.- ICANN, 2011, pp. 52–59.

N. Tajbakhsh, J. Y. Shin and S. R. Gurudu, “Convolutional neural networks for medical image analysis: full training or fine tuning?, IEEE Transactions on Medical Imaging, vol. 35(5), 2016, pp. 1299–1312.

A. Krizhevsky and G.E. Hinton, “Convolutional deep belief networks on cifar-10”, Unpublished manuscript, vol. 40, 2010.


Refbacks

  • There are currently no refbacks.