Обзор методов обучения глубоких нейронных сетей
Автор: Созыкин Андрей Владимирович
Рубрика: Информатика, вычислительная техника и управление
Статья в выпуске: 3 т.6, 2017 года.
Бесплатный доступ
Глубокие нейронные сети в настоящее время становятся одним из самых популярных подходов к созданию систем искусственного интеллекта, таких как распознавание речи, обработка естественного языка, компьютерное зрение и т.п. В статье представлен обзор истории развития и современного состояния методов обучению глубоких нейронных сетей. Рассматривается модель искусственной нейронной сети, алгоритмы обучения нейронных сетей, в том числе алгоритм обратного распространения ошибки, применяемый для обучения глубоких нейронных сетей. Описывается развитие архитектур нейронных сетей: неокогнитрон,автокодировщики, сверточные нейронные сети, ограниченная машина Больцмана, глубокие сети доверия,сети долго-краткосрочной памяти, управляемые рекуррентные нейронные сети и сети остаточного обучения.Глубокие нейронные сети с большим количеством скрытых слоев трудно обучать из-за проблемы исчезающего градиента. В статье рассматриваются методы решения этой проблемы, которые позволяют успешно обучать глубокие нейронные сети с более чем ста слоями. Приводится обзор популярных библиотек глубокого обучения нейронных сетей, которые сделали возможным широкое практическое применение данной технологии. В настоящее время для задач компьютерного зрения используются сверточные нейронные сети, а для обработки последовательностей, в том числе естественного языка, - рекуррентные нейронные сети, прежде всего сети долго-краткосрочной памяти и управляемые рекуррентные нейронные сети.
Глубокое обучение, нейронные сети, машинное обучение
Короткий адрес: https://sciup.org/147160624
IDR: 147160624 | УДК: 004.85 | DOI: 10.14529/cmse170303
An overview of methods for deep learning in neural networks
At present, deep learning is becoming one of the most popular approach to creation of the artificial intelligences systems such as speech recognition, natural language processing, computer vision and so on. Thepaper presents a historical overview of deep learning in neural networks. The model of the artificial neural networkis described as well as the learning algorithms for neural networks including the error backpropagation algorithm, which is used to train deep neural networks. The development of neural networks architectures is presentedincluding neocognitron, autoencoders, convolutional neural networks, restricted Boltzmann machine, deep beliefnetworks, long short-term memory, gated recurrent neural networks, and residual networks. Training deep neuralnetworks with many hidden layers is impeded by the vanishing gradient problem. The paper describes theapproaches to solve this problem that provide the ability to train neural networks with more than hundred layers.An overview of popular deep learning libraries is presented. Nowadays, for computer vision tasks convolutionalneural networks are utilized, while for sequence processing, including natural language processing, recurrentnetworks are preferred solution, primarily long short-term memory networks and gated recurrent neural networks.
Список литературы Обзор методов обучения глубоких нейронных сетей
- LeCun Y., Bengio Y., Hinton G. Deep Learning//Nature. 2015. Vol. 521. P. 436-444 DOI: 10.1038/nature14539
- Rav`ı D., Wong Ch., Deligianni F., et al. Deep Learning for Health Informatics//IEEE Journal of Biomedical and Health Informatics. 2017. Vol. 21, No. 1. P. 4-21 DOI: 10.1109/JBHI.2016.2636665
- Schmidhuber J. Deep Learning in Neural Networks: an Overview//Neural Networks. 2015.Vol. 1. P. 85-117, DOI: 10.1016/j.neunet.2014.09.003
- McCulloch W.S., Pitts W. A Logical Calculus of the Ideas Immanent in NervousActivity//The Bulletin of Mathematical Biophysics. 1943. Vol. 5, No. 4. P. 115-133 DOI: 10.1007/BF02478259
- Hinton G., Salakhutdinov R. Reducing the Dimensionality of Data with Neural Networks//Science. 2006. Vol. 313, No. 5786. P. 504-507 DOI: 10.1126/science.1127647
- Hinton G.E., Osindero S., Teh Y.-W. A Fast Learning Algorithm for Deep Belief Nets//Neural Computing. 2006. Vol. 18, No. 7. P. 1527-1554 DOI: 10.1162/neco.2006.18.7.1527
- S´ıma J. Loading Deep Networks Is Hard//Neural Computation. 1994. Vol. 6, No. 5.P. 842-850 DOI: 10.1162/neco.1994.6.5.842
- Windisch D. Loading Deep Networks Is Hard: The Pyramidal Case//Neural Computation.2005. Vol. 17, No. 2. P. 487-502 DOI: 10.1162/0899766053011519
- Gomez F.J., Schmidhuber J. Co-Evolving Recurrent Neurons Learn Deep Memory POMDPs//Proc. of the 2005 Conference on Genetic and Evolutionary Computation (GECCO) (Washington, DC, USA, June 25-29, 2005), 2005. P. 491-498 DOI: 10.1145/1068009.1068092
- Ciresan D.C., Meier U., Gambardella L.M., Schmidhuber J. Deep, Big, Simple NeuralNets for Handwritten Digit Recognition//Neural Computation. 2010. Vol. 22, No. 12. P. 3207-3220 DOI: 10.1162/NECO_a_00052
- He K., Zhang X., Ren S., et al. Deep Residual Learning for Image Recognition//2016 IEEEConference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA, 27-30 June 2016), 2016. P. 770-778 DOI: 10.1109/CVPR.2016.90
- Rumelhart D.E., Hinton G.E., McClelland J.L. A General Framework for Parallel DistributedProcessing//Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. Vol. 1, P. 45-76 DOI: 10.1016/B978-1-4832-1446-7.50010-8
- LeCun Y., Bottou L., Orr G.B. Efficient BackProp//Neural Networks: Tricks of the Trade.1998. P. 9-50 DOI: 10.1007/3-540-49430-8_2
- Broomhead D.S., Lowe D. Multivariable Functional Interpolation and Adaptive Networks//Complex Systems. Vol. 2. P. 321-355 DOI: 10.1016/0167-6911(92)90025-N
- Stone M.N. The Generalized Weierstrass Approximation Theorem//Mathematics Magazine.1948. Vol. 21, No. 4. P. 167-184 DOI: 10.2307/3029750
- Горбань А.Н., Дунин-Барковский В.Л., Кирдин А.Н. и др. Нейроинформатика.Новосибирск: Наука. 1998. C. 296.
- Hornik K., Stinchcombe M., White H. Multilayer Feedforward Networks areUniversal Approximators//Neural Networks. 1989. Vol. 2, No. 5. P. 359-366 DOI: 10.1016/0893-6080(89)90020-8
- Mhaskar H.N., Micchelli Ch.A. Approximation by Superposition of Sigmoidal and RadialBasis Functions//Advances in Applied Mathematics. 1992. Vol. 13, No. 13. P. 350-373 DOI: 10.1016/0196-8858(92)90016-P
- Hebb D.O. The Organization of Behavior. New York:Wiley. 1949. 335 p DOI: 10.1016/S0361-9230(99)00182-3
- Novikoff A.B. On Convergence Proofs on Perceptrons//Symposium on the MathematicalTheory of Automata. 1962. Vol. 12. P. 615-622.
- Rosenblatt F. The Perceptron: a Probabilistic Model for Information Storage and Organization in the Brain//Psychological Review. 1958. P. 65-386 DOI: 10.1037/h0042519
- Widrow B., Hoff M. Associative Storage and Retrieval of Digital Information in Networksof Adaptive Neurons//Biological Prototypes and Synthetic Systems. 1962. Vol. 1. 160 p DOI: 10.1007/978-1-4684-1716-6_25
- Narendra K.S., Thathatchar M.A.L. Learning Automata -a Survey//IEEE Transactions onSystems, Man, and Cybernetics. 1974. Vol. 4. P. 323-334 DOI: 10.1109/tsmc.1974.5408453
- Rosenblatt F. Principles of Neurodynamics; Perceptrons and the Theory of BrainMechanisms. 1962. Washington: Spartan Books. 616 p DOI: 10.1007/978-3-642-70911-1_20
- Grossberg S. Some Networks That Can Learn, Remember, and Reproduce any Number ofComplicated Space-Time Patterns//International Journal of Mathematics and Mechanics. 1969. Vol. 19. P. 53-91 DOI: 10.1512/iumj.1970.19.19007
- Kohonen T. Correlation Matrix Memories//IEEE Transactions on Computers. 1972.Vol. 100, No. 4. P. 353-359 DOI: 10.1109/tc.1972.5008975
- von der Malsburg C. Self-Organization of Orientation Sensitive Cells in the Striate Cortex//Kybernetik. 1973. Vol. 14, No. 2. P. 85-100 DOI: 10.1007/bf00288907
- Willshaw D.J., von der Malsburg C. How Patterned Neural Connections Can Be Set Up bySelf-Organization//Proceedings of the Royal Society London B. 1976. Vol. 194. P. 431-445 DOI: 10.1098/rspb.1976.0087
- Ivakhnenko A.G. Heuristic Self-Organization in Problems of Engineering Cybernetics//Automatica. 1970. Vol. 6, No. 2. P. 207-219 DOI: 10.1016/0005-1098(70)90092-0
- Ivakhnenko A.G. Polynomial Theory of Complex Systems//IEEE Transactions on Systems,Man and Cybernetics. 1971. Vol. 4. P. 364-378 DOI: 10.1109/tsmc.1971.4308320
- Ikeda S., Ochiai M., Sawaragi Y. Sequential GMDH Algorithm and Its Application to RiverFlow Prediction//IEEE Transactions on Systems, Man and Cybernetics. 1976. Vol. 7. P. 473-479 DOI: 10.1109/tsmc.1976.4309532
- Witczak M, Korbicz J, Mrugalski M., et al. A GMDH Neural Network-BasedApproach to Robust Fault Diagnosis:Application to the DAMADICS BenchmarkProblem//Control Engineering Practice. 2006. Vol. 14, No. 6. P. 671-683 DOI: 10.1016/j.conengprac.2005.04.007
- Kondo T., Ueno J. Multi-Layered GMDH-type Neural Network Self-Selecting OptimumNeural Network Architecture and Its Application to 3-Dimensional Medical Image Recognition of Blood Vessels//International Journal of Innovative Computing, Information and Control. 2008. Vol. 4, No. 1. P. 175-187.
- Linnainmaa S. The Representation of the Cumulative Rounding Error of an Algorithm as aTaylor Expansion of the Local Rounding Errors. University of Helsinki. 1970.
- Linnainmaa S. Taylor Expansion of the Accumulated Rounding Error//BIT NumericalMathematics. 1976. Vol. 16, No. 2. P. 146-160 DOI: 10.1007/bf01931367
- Werbos P.J. Applications of Advances in Nonlinear Sensitivity Analysis//Lecture Notes inControl and Information Sciences. 1981. Vol. 38, P. 762-770 DOI: 10.1007/BFb0006203
- Parker D.B. Learning Logic. Technical Report TR-47, Center for Computational Researchin Economics and Management Science, Massachusetts Institute of Technology, Cambridge, MA. 1985.
- LeCun Y. A Theoretical Framework for Back-Propagation//Proceedings of the 1988Connectionist Models Summer School (Pittsburgh, Pennsylvania, USA, June 17-26, 1988), 1988. P. 21-28.
- Rumelhart D.E., Hinton G.E., Williams R.J. Learning Internal Representations byError Propagation//Parallel Distributed Processing. 1986. Vol. 1. P. 318-362 DOI: 10.1016/b978-1-4832-1446-7.50035-2
- Qian N. On the Momentum Term in Gradient Descent Learning Algorithms//NeuralNetworks: The Official Journal of the International Neural Network Society. 1999. Vol. 12, No. 1. P. 145-151 DOI: 10.1016/s0893-6080(98)00116-6
- Duchi J., Hazan E., Singer Y. Adaptive Subgradient Methods for Online Learningand Stochastic Optimization//Journal of Machine Learning Research. 2011. Vol. 12. P. 2121-2159.
- Kingma D.P., Ba J.L. Adam: a Method for Stochastic Optimization//InternationalConference on Learning Representations (San Diego, USA, May 7-9, 2015), 2015. P. 1-13.
- Fukushima K. Neocognitron: a Self-Organizing Neural Network Model for a Mechanism ofPattern Recognition Unaffected by Shift in Position//Biological Cybernetics. 1980. Vol. 36, No. 4. P. 193-202 DOI: 10.1007/BF00344251
- Wiesel D.H., Hubel T.N. Receptive Fields of Single Neurones in the Cat’sStriate Cortex//The Journal of Physiology. 1959. Vol. 148, No. 3. P. 574-591 DOI: 10.1113/jphysiol.1959.sp006308
- Fukushima K. Artificial Vision by Multi-Layered Neural Networks: Neocognitron and itsAdvances//Neural Networks. 2013. Vol. 37. P. 103-119 DOI: 10.1016/j.neunet.2012.09.016
- Fukushima K. Training Multi-Layered Neural Network Neocognitron//Neural Networks.2013. Vol. 40. P. 18-31 DOI: 10.1016/j.neunet.2013.01.001
- Fukushima K. Increasing Robustness Against Background Noise:Visual PatternRecognition by a Neocognitron//Neural Networks. 2011. Vol. 24, No. 7. P. 767-778 DOI: 10.1016/j.neunet.2011.03.017
- Ballard D.H. Modular Learning in Neural Networks//Proceedings of the Sixth NationalConference on Artificial Intelligence (Seattle, Washington, USA, July 13-17, 1987), 1987. Vol. 1. P. 279-284.
- Hinton G.E., McClelland J.L. Learning Representations by Recirculation//NeuralInformation Processing Systems. 1998. American Institute of Physics. P. 358-366.
- Wolpert D.H. Stacked Generalization//Neural Networks. 1992. Vol. 5, No. 2. P. 241-259 DOI: 10.1016/s0893-6080(05)80023-1
- Ting K.M., Witten I.H. Stacked Generalization: When Does It Work?//Proceedings ofthe International Joint Conference on Artificial Intelligence (IJCAI) (Nagoya, Japan, August 23-29, 1997), 1997. P. 866-871.
- LeCun Y., Boser B., Denker J.S., et al. Back-Propagation Applied to HandwrittenZip Code Recognition//Neural Computation. 1998. Vol. 1, No. 4. P. 541-551 DOI: 10.1162/neco.1989.1.4.541
- LeCun Y., Boser B., Denker J.S., et al. Handwritten Digit Recognition with aBack-Propagation Network//Advances in Neural Information Processing Systems 2. Morgan Kaufmann. 1990. P. 396-404.
- Baldi P., Chauvin Y. Neural Networks for Fingerprint Recognition//Neural Computation.1993. Vol. 5, No. 3. P. 402-418 DOI: 10.1007/978-3-642-76153-9_35
- Elman J.L. Finding Structure in Time//Cognitive Science. 1990. Vol. 14, No. 2. P. 179-211 DOI: 10.1207/s15516709cog1402_1
- Jordan M.I. Serial Order: a Parallel Distributed Processing Approach. Institute for CognitiveScience, University of California, San Diego. ICS Report 8604. 1986. P. 40.
- Jordan M.I. Serial Order: a Parallel Distributed Processing Approach//Advances inPsychology. 1997. Vol. 121. P. 471-495 DOI: 10.1016/s0166-4115(97)80111-2
- Hochreiter S. Untersuchungen zu Dynamischen Neuronalen Netzen. Diploma thesis, Institutf¨ur Informatik, Lehrstuhl Prof. Brauer. Technische Universit¨at M¨unchen, 1991.
- Hochreiter S., Bengio Y., Frasconi P., et al. Gradient Flow in Recurrent Nets: the Difficultyof Learning Long-Term Dependencies//A Field Guide to Dynamical Recurrent Neural Networks. Wiley-IEEE Press. 2001. P. 237-243 DOI: 10.1109/9780470544037.ch14
- Bengio Y., Simard P., Frasconi P. Learning Long-Term Dependencies with Gradient Descentis Difficult//IEEE Transactions on Neural Networks. 1994. Vol. 5, No. 2. P. 157-166 DOI: 10.1109/72.279181
- Tiˇno P., Hammer B. Architectural Bias in Recurrent Neural Networks: Fractal Analysis//Neural Computation. 2004. Vol. 15, No. 8. P. 1931-1957 DOI: 10.1162/08997660360675099
- Hochreiter S., Schmidhuber J. Bridging Long Time Lags by Weight Guessing and "LongShort-Term Memory"//Spatiotemporal Models in Biological and Artificial Systems. 1996. Vol. 37. P. 65-72.
- Schmidhuber J., Wierstra D., Gagliolo M., et al. Training Recurrent Networks by Evolino.//Neural Computation. 2007. Vol. 19, No. 3. P. 757-779 DOI: 10.1162/neco.2007.19.3.757
- Levin L.A. Universal Sequential Search Problems//Problems of Information Transmission.1997. Vol. 9, No. 3. P. 265-266.
- Schmidhuber J. Discovering Neural Nets with Low Kolmogorov Complexity and HighGeneralization Capability//Neural Networks. 1997. Vol. 10, No. 5. P. 857-873 DOI: 10.1016/s0893-6080(96)00127-x
- M{\e}ller, M.F. Exact Calculation of the Product of the Hessian Matrix of Feed-ForwardNetwork Error Functions and a Vector in O(N) Time. Computer Science Department, Aarhus University, Denmark. 1993. No. PB-432 DOI: 10.7146/dpb.v22i432.6748
- Pearlmutter B.A. Fast Exact Multiplication by the Hessian//Neural Computation. 1994.Vol. 6, No. 1. P. 147-160 DOI: 10.1162/neco.1994.6.1.147
- Schraudolph N.N.FastCurvatureMatrix-VectorProductsforSecond-OrderGradient Descent//Neural Computation. 2002. Vol. 14, No. 7. P. 1723-1738 DOI: 10.1162/08997660260028683
- Martens J. Deep Learning via Hessian-Free Optimization//Proceedings of the 27thInternational Conference on Machine Learning (ICML-10) (Haifa, Israel, June 21-24, 2010), 2010. P. 735-742.
- Martens J., Sutskever I. Learning Recurrent Neural Networks with Hessian-FreeOptimization//Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Bellevue, Washington, USA, June 28 -July 02, 2011), 2011. P. 1033-1040.
- Schmidhuber J. Learning Complex, Extended Sequences Using the Principle ofHistory Compression//Neural Computation. 1992. Vol. 4, No. 2. P. 234-242 DOI: 10.1162/neco.1992.4.2.234
- Connor J., Martin D.R., Atlas L.E. Recurrent Neural Networks and Robust Time SeriesPrediction//IEEE Transactions on Neural Networks. 1994. Vol. 5, No. 2. P. 240-254 DOI: 10.1109/72.279188
- Dorffner G. Neural Networks for Time Series Processing//Neural Network World. 1996.Vol. 6. P. 447-468.
- Schmidhuber J., Mozer M.C., Prelinger D. Continuous History Compression//Proceedingsof International Workshop on Neural Networks (Aachen, Germany, 1993), 1993. P. 87-95.
- Hochreiter S., Schmidhuber J. Long Short-Term Memory//Neural Computation. 1997.Vol. 9, No. 8. P. 1735-1780 DOI: 10.1162/neco.1997.9.8.1735
- Gers F.A., Schmidhuber J., Cummins F. Learning to Forget: Continual Predictionwith LSTM//Neural Computation. 2000. Vol. 12, No. 10. P. 2451-2471 DOI: 10.1162/089976600300015015
- P´erez-Ortiz J.A., Gers F.A., Eck D., et al. Kalman Filters Improve LSTM NetworkPerformance in Problems Unsolvable by Traditional Recurrent Nets//Neural Networks. 2003. Vol. 16, No. 2. P. 241-250 DOI: 10.1016/s0893-6080(02)00219-8
- Weng J., Ahuja N., Huang T.S. Cresceptron: a Self-Organizing Neural Network Which GrowsAdaptively//International Joint Conference on Neural Networks (IJCNN) (Baltimore, MD, USA, 7-11 June 1992). 1992. Vol. 1. P. 576-581 DOI: 10.1109/ijcnn.1992.287150
- Weng J.J., Ahuja N., Huang T.S. Learning Recognition and Segmentation Using theCresceptron//International Journal of Computer Vision. 1997. Vol. 25, No. 2. P. 109-143 DOI: 10.1023/a:1007967800668
- Ranzato M.A., Huang F.J., Boureau Y.L., et al. Unsupervised Learning of Invariant FeatureHierarchies with Applications to Object Recognition//IEEE Conference on Computer Vision and Pattern Recognition (Minneapolis, MN, USA, 17-22 June 2007), 2007. P. 1-8 DOI: 10.1109/cvpr.2007.383157
- Scherer D., M¨uller A., Behnke S. Evaluation of Pooling Operations in ConvolutionalArchitectures for Object Recognition//Lecture Notes in Computer Science. 2010. Vol. 6354, P. 92-101 DOI: 10.1007/978-3-642-15825-4_10
- Smolensky P. Information Processing in Dynamical Systems: Foundations of HarmonyTheory//Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. Vol. 1. P. 194-281.
- Hinton G.E., Sejnowski T.E. Learning and Relearning in Boltzmann Machines//ParallelDistributed Processing. 1986. Vol. 1. P. 282-317.
- Memisevic R., Hinton G.E. Learning to Represent Spatial Transformations with FactoredHigher-Order Boltzmann Machines//Neural Computation. 2010. Vol. 22, No. 6. P. 1473-1492 DOI: 10.1162/neco.2010.01-09-953
- Mohamed A., Hinton G.E. Phone Recognition Using Restricted Boltzmann Machines//IEEE International Conference on Acoustics, Speech and Signal Processing (Dallas, TX, USA, 14-19 March 2010), 2010. P. 4354-4357 DOI: 10.1109/icassp.2010.5495651
- Salakhutdinov R., Hinton G. Semantic Hashing//International Journal of ApproximateReasoning. 2009. Vol. 50, No. 7. P. 969-978 DOI: 10.1016/j.ijar.2008.11.006
- Bengio Y., Lamblin P., Popovici D., et al. Greedy Layer-Wise Training of Deep Networks//Advances in Neural Information Processing Systems 19. 2007. P. 153-160.
- Vincent P., Hugo L., Bengio Y., et al. Extracting and Composing Robust Featureswith Denoising Autoencoders//Proceedings of the 25th international Conference on Machine learning (Helsinki, Finland, July 05-09, 2008). 2008. P. 1096-1103 DOI: 10.1145/1390156.1390294
- Erhan D., Bengio Y., Courville A., et al. Why Does Unsupervised Pre-Training Help DeepLearning?//Journal of Machine Learning Research. 2010. Vol. 11. P. 625-660.
- Arel I., Rose D.C., Karnowski T.P. Deep Machine Learning -a New Frontier in ArtificialIntelligence Research//Computational Intelligence Magazine, IEEE. 2010. Vol. 5, No. 4. P. 13-18 DOI: 10.1109/mci.2010.938364
- Viren J., Sebastian S. Natural Image Denoising with Convolutional Networks//Advancesin Neural Information Processing Systems (NIPS) 21. 2009. P. 769-776.
- Razavian A.Sh., Azizpour H., Sullivan J., at al. CNN Features Off-the-Shelf: An AstoundingBaseline for Recognition//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (Washington, DC, USA, June 23-28, 2014), 2014. P. 512-519 DOI: 10.1109/cvprw.2014.131
- Ruochen W., Zhe X. A Pedestrian and Vehicle Rapid Identification Model Based onConvolutional Neural Network//Proceedings of the 7th International Conference on Internet Multimedia Computing and Service (ICIMCS ’15) (Zhangjiajie, China, August 19-21, 2015), 2015. P. 32:1-32:4 DOI: 10.1145/2808492.2808524
- Boominathan L., Kruthiventi S.S., Babu R.V. CrowdNet:A Deep ConvolutionalNetwork for Dense Crowd Counting//Proceedings of the 2016 ACM on Multimedia Conference (Amsterdam, The Netherlands, October 15-19, 2016), 2016. P. 640-644 DOI: 10.1145/2964284.2967300
- Kinnikar A., Husain M., Meena S.M. Face Recognition Using Gabor Filter AndConvolutional Neural Network//Proceedings of the International Conference on Informatics and Analytics (Pondicherry, India, August 25-26, 2016), 2016. P. 113:1-113:4 DOI: 10.1145/2980258.2982104
- Hahnloser R.H.R., Sarpeshkar R., Mahowald M.A., et al. Digital Selection and AnalogueAmplification Coexist in a Cortex-Inspired Silicon Circuit//Nature. 2000. Vol. 405. P. 947-951 DOI: 10.1038/35016072
- Hahnloser R.H.R., Seung H.S., Slotine J.J. Permitted and Forbidden Sets in SymmetricThreshold-Linear Networks//Neural Computation. 2003. Vol. 15, No. 3. P. 621-638 DOI: 10.1162/089976603321192103
- Glorot X., Bordes A., Bengio Y. Deep Sparse Rectifier Neural Networks//Journal ofMachine Learning Research. 2011. Vol. 15. P. 315-323.
- Glorot X., Bengio Y. Understanding the Difficulty of Training Deep Feedforward NeuralNetworks//Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10) (Sardinia, Italy, May 13-15, 2010). Society for Artificial Intelligence and Statistics. 2010. P. 249-256.
- He K., Zhang X., Ren Sh. et al. Delving Deep into Rectifiers: Surpassing Human-LevelPerformance on ImageNet Classification//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (Santiago, Chile, December 7-13, 2015), 2015. P. 1026-1034 DOI: 10.1109/ICCV.2015.123
- Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift//JMLR Workshop and Conference Proceedings. Proceedings of the 32nd International Conference on Machine Learning (Lille, France, July 06-11, 2015), 2015. Vol. 37. P. 448-456.
- Szegedy C., Liu W, Jia Y. et al. Going Deeper with Convolutions//IEEE Conferenceon Computer Vision and Pattern Recognition (Boston, MA, USA, June 7-12, 2015), 2015. P. 1-9 DOI: 10.1109/CVPR.2015.7298594
- Szegedy C., Vanhoucke V., Ioffe S., et al. Rethinking the Inception Architecturefor Computer Vision//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Seattle, WA, USA, Jun 27-30, 2016), 2016. P. 2818-2826 DOI: 10.1109/cvpr.2016.308
- Szegedy C., Ioffe S., Vanhoucke V., et al. Inception-v4, Inception-ResNet and the Impactof Residual Connections on Learning//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) (San Francisco, California, USA, February 4-9, 2017), 2017. P. 4278-4284.
- Cho K., van Merrienboer B., Gulcehre C., et al. Learning Phrase Representations UsingRNN Encoder-Decoder for Statistical Machine Translation//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha, Qatar, October 25-29, 2014), 2014. P. 1724-1734 DOI: 10.3115/v1/d14-1179
- Cho K., van Merrienboer B., Bahdanau D., et al. On the Properties of Neural MachineTranslation: Encoder-Decoder Approaches//Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (Doha, Qatar, October 25, 2014), 2014. P. 103-111 DOI: 10.3115/v1/w14-4012
- Chung, J., Gulcehre, C., Cho, K., et al. Empirical Evaluation of Gated Recurrent NeuralNetworks on Sequence Modeling//NIPS 2014 Workshop on Deep Learning (Montreal, Canada, December 12, 2014), 2014. P. 1-9.
- He K., Sun J. Convolutional Neural Networks at Constrained Time Cost//2015 IEEEConference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, June 07-12, 2015), 2015. P. 5353-5360 DOI: 10.1109/CVPR.2015.7299173
- Jia Y., Shelhamer E., Donahue J., et al. Caffe: Convolutional Architecture for FastFeature Embedding//Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL, USA, November 03-07, 2014), 2014. P. 675-678 DOI: 10.1145/2647868.2654889
- Kruchinin D., Dolotov E., Kornyakov K. et al. Comparison of Deep Learning Libraries onthe Problem of Handwritten Digit Classification//Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science. 2015. Vol. 542. P. 399-411 DOI: 10.1007/978-3-319-26123-2_38
- Bahrampour S., Ramakrishnan N., Schott L., et al. Comparative Study of Deep LearningSoftware Frameworks. URL: https://arxiv.org/abs/1511.06435 (дата обращения: 02.07.2017).
- Bergstra J., Breuleux O., Bastien F., et al. Theano: a CPU and GPU Math ExpressionCompiler//Proceedings of the Python for Scientific Computing Conference (SciPy) (Austin, TX, USA, June 28 -July 3, 2010), 2010. P. 3-10.
- Abadi M., Agarwal A. Barham P. TensorFlow: Large-Scale Machine Learning onHeterogeneous Distributed Systems//Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) (Savannah, GA, USA, November, 2-4, 2016), 2016. P. 265-283.
- Collobert R., Kavukcuoglu K., Farabet C. Torch7: a Matlab-like Environment for MachineLearning//BigLearn, NIPS Workshop (Granada, Spain, December 12-17, 2011), 2011.
- Seide F., Agarwal A. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) (San Francisco, California, USA, August 13-17, 2016), 2016. P. 2135-2135 DOI: 10.1145/2939672.2945397
- Viebke A., Pllana S. The Potential of the Intel(r) Xeon Phi for Supervised DeepLearning//IEEE 17th International Conference on High Performance Computing and Communications (HPCC) (New York, USA, August 24-26, 2015), 2015. P. 758-765 DOI: 10.1109/hpcc-css-icess.2015.45
- Chollet. F., et al. Keras. 2015. URL: https://github.com/fchollet/keras (датаобращения: 02.07.2017).
- PadlePadle: PArallel Distributed Deep LEarning. URL: http://www.paddlepaddle.org/(дата обращения: 02.07.2017).
- Chen T., Li M., Li Y. MXNet: A Flexible and Efficient Machine Learning Libraryfor Heterogeneous Distributed Systems. URL: https://arxiv.org/abs/1512.01274 (дата обращения: 02.07.2017).
- Intel Nervana Reference Deep Learning Framework Committed to Best Performance on allHardware. URL: https://www.intelnervana.com/neon/(дата обращения: 02.07.2017).
- Shi Sh., Wang Q., Xu P. Benchmarking State-of-the-Art Deep Learning Software Tools.URL: https://arxiv.org/abs/1608.07249 (дата обращения: 02.07.2017).
- Weiss K., Khoshgoftaar T.M., Wang D. A Survey of Transfer Learning//Journal of BigData. 2016. Vol. 3, No. 1. P. 1-9 DOI: 10.1186/s40537-016-0043-6
- Ba J., Mnih V., Kavukcuoglu K. Multiple Object Recognition with Visual Attention//Proceedings of the International Conference on Learning Representations (ICLR) (San Diego, USA,May 7-9, 2015), 2015. P. 1-10.
- Graves A., Mohamed A.R., Hinton G. Speech Recognition with Deep RecurrentNeural Networks//IEEE International Conference on Acoustics, Speech and Signal Processing (Vancouver, Canada, May 26-31, 2013), 2013. P. 6645-6649 DOI: 10.1109/ICASSP.2013.6638947