Обзор методов обучения глубоких нейронных сетей
Автор: Созыкин Андрей Владимирович
Рубрика: Информатика, вычислительная техника и управление
Статья в выпуске: 3 т.6, 2017 года.
Бесплатный доступ
Глубокие нейронные сети в настоящее время становятся одним из самых популярных подходов к созданию систем искусственного интеллекта, таких как распознавание речи, обработка естественного языка, компьютерное зрение и т.п. В статье представлен обзор истории развития и современного состояния методов обучению глубоких нейронных сетей. Рассматривается модель искусственной нейронной сети, алгоритмы обучения нейронных сетей, в том числе алгоритм обратного распространения ошибки, применяемый для обучения глубоких нейронных сетей. Описывается развитие архитектур нейронных сетей: неокогнитрон,автокодировщики, сверточные нейронные сети, ограниченная машина Больцмана, глубокие сети доверия,сети долго-краткосрочной памяти, управляемые рекуррентные нейронные сети и сети остаточного обучения.Глубокие нейронные сети с большим количеством скрытых слоев трудно обучать из-за проблемы исчезающего градиента. В статье рассматриваются методы решения этой проблемы, которые позволяют успешно обучать глубокие нейронные сети с более чем ста слоями. Приводится обзор популярных библиотек глубокого обучения нейронных сетей, которые сделали возможным широкое практическое применение данной технологии. В настоящее время для задач компьютерного зрения используются сверточные нейронные сети, а для обработки последовательностей, в том числе естественного языка, - рекуррентные нейронные сети, прежде всего сети долго-краткосрочной памяти и управляемые рекуррентные нейронные сети.
Глубокое обучение, нейронные сети, машинное обучение
Короткий адрес: https://sciup.org/147160624
IDR: 147160624 | DOI: 10.14529/cmse170303
Список литературы Обзор методов обучения глубоких нейронных сетей
- LeCun Y., Bengio Y., Hinton G. Deep Learning//Nature. 2015. Vol. 521. P. 436-444 DOI: 10.1038/nature14539
- Rav`ı D., Wong Ch., Deligianni F., et al. Deep Learning for Health Informatics//IEEE Journal of Biomedical and Health Informatics. 2017. Vol. 21, No. 1. P. 4-21 DOI: 10.1109/JBHI.2016.2636665
- Schmidhuber J. Deep Learning in Neural Networks: an Overview//Neural Networks. 2015.Vol. 1. P. 85-117, DOI: 10.1016/j.neunet.2014.09.003
- McCulloch W.S., Pitts W. A Logical Calculus of the Ideas Immanent in NervousActivity//The Bulletin of Mathematical Biophysics. 1943. Vol. 5, No. 4. P. 115-133 DOI: 10.1007/BF02478259
- Hinton G., Salakhutdinov R. Reducing the Dimensionality of Data with Neural Networks//Science. 2006. Vol. 313, No. 5786. P. 504-507 DOI: 10.1126/science.1127647
- Hinton G.E., Osindero S., Teh Y.-W. A Fast Learning Algorithm for Deep Belief Nets//Neural Computing. 2006. Vol. 18, No. 7. P. 1527-1554 DOI: 10.1162/neco.2006.18.7.1527
- S´ıma J. Loading Deep Networks Is Hard//Neural Computation. 1994. Vol. 6, No. 5.P. 842-850 DOI: 10.1162/neco.1994.6.5.842
- Windisch D. Loading Deep Networks Is Hard: The Pyramidal Case//Neural Computation.2005. Vol. 17, No. 2. P. 487-502 DOI: 10.1162/0899766053011519
- Gomez F.J., Schmidhuber J. Co-Evolving Recurrent Neurons Learn Deep Memory POMDPs//Proc. of the 2005 Conference on Genetic and Evolutionary Computation (GECCO) (Washington, DC, USA, June 25-29, 2005), 2005. P. 491-498 DOI: 10.1145/1068009.1068092
- Ciresan D.C., Meier U., Gambardella L.M., Schmidhuber J. Deep, Big, Simple NeuralNets for Handwritten Digit Recognition//Neural Computation. 2010. Vol. 22, No. 12. P. 3207-3220 DOI: 10.1162/NECO_a_00052
- He K., Zhang X., Ren S., et al. Deep Residual Learning for Image Recognition//2016 IEEEConference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA, 27-30 June 2016), 2016. P. 770-778 DOI: 10.1109/CVPR.2016.90
- Rumelhart D.E., Hinton G.E., McClelland J.L. A General Framework for Parallel DistributedProcessing//Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. Vol. 1, P. 45-76 DOI: 10.1016/B978-1-4832-1446-7.50010-8
- LeCun Y., Bottou L., Orr G.B. Efficient BackProp//Neural Networks: Tricks of the Trade.1998. P. 9-50 DOI: 10.1007/3-540-49430-8_2
- Broomhead D.S., Lowe D. Multivariable Functional Interpolation and Adaptive Networks//Complex Systems. Vol. 2. P. 321-355 DOI: 10.1016/0167-6911(92)90025-N
- Stone M.N. The Generalized Weierstrass Approximation Theorem//Mathematics Magazine.1948. Vol. 21, No. 4. P. 167-184 DOI: 10.2307/3029750
- Горбань А.Н., Дунин-Барковский В.Л., Кирдин А.Н. и др. Нейроинформатика.Новосибирск: Наука. 1998. C. 296.
- Hornik K., Stinchcombe M., White H. Multilayer Feedforward Networks areUniversal Approximators//Neural Networks. 1989. Vol. 2, No. 5. P. 359-366 DOI: 10.1016/0893-6080(89)90020-8
- Mhaskar H.N., Micchelli Ch.A. Approximation by Superposition of Sigmoidal and RadialBasis Functions//Advances in Applied Mathematics. 1992. Vol. 13, No. 13. P. 350-373 DOI: 10.1016/0196-8858(92)90016-P
- Hebb D.O. The Organization of Behavior. New York:Wiley. 1949. 335 p DOI: 10.1016/S0361-9230(99)00182-3
- Novikoff A.B. On Convergence Proofs on Perceptrons//Symposium on the MathematicalTheory of Automata. 1962. Vol. 12. P. 615-622.
- Rosenblatt F. The Perceptron: a Probabilistic Model for Information Storage and Organization in the Brain//Psychological Review. 1958. P. 65-386 DOI: 10.1037/h0042519
- Widrow B., Hoff M. Associative Storage and Retrieval of Digital Information in Networksof Adaptive Neurons//Biological Prototypes and Synthetic Systems. 1962. Vol. 1. 160 p DOI: 10.1007/978-1-4684-1716-6_25
- Narendra K.S., Thathatchar M.A.L. Learning Automata -a Survey//IEEE Transactions onSystems, Man, and Cybernetics. 1974. Vol. 4. P. 323-334 DOI: 10.1109/tsmc.1974.5408453
- Rosenblatt F. Principles of Neurodynamics; Perceptrons and the Theory of BrainMechanisms. 1962. Washington: Spartan Books. 616 p DOI: 10.1007/978-3-642-70911-1_20
- Grossberg S. Some Networks That Can Learn, Remember, and Reproduce any Number ofComplicated Space-Time Patterns//International Journal of Mathematics and Mechanics. 1969. Vol. 19. P. 53-91 DOI: 10.1512/iumj.1970.19.19007
- Kohonen T. Correlation Matrix Memories//IEEE Transactions on Computers. 1972.Vol. 100, No. 4. P. 353-359 DOI: 10.1109/tc.1972.5008975
- von der Malsburg C. Self-Organization of Orientation Sensitive Cells in the Striate Cortex//Kybernetik. 1973. Vol. 14, No. 2. P. 85-100 DOI: 10.1007/bf00288907
- Willshaw D.J., von der Malsburg C. How Patterned Neural Connections Can Be Set Up bySelf-Organization//Proceedings of the Royal Society London B. 1976. Vol. 194. P. 431-445 DOI: 10.1098/rspb.1976.0087
- Ivakhnenko A.G. Heuristic Self-Organization in Problems of Engineering Cybernetics//Automatica. 1970. Vol. 6, No. 2. P. 207-219 DOI: 10.1016/0005-1098(70)90092-0
- Ivakhnenko A.G. Polynomial Theory of Complex Systems//IEEE Transactions on Systems,Man and Cybernetics. 1971. Vol. 4. P. 364-378 DOI: 10.1109/tsmc.1971.4308320
- Ikeda S., Ochiai M., Sawaragi Y. Sequential GMDH Algorithm and Its Application to RiverFlow Prediction//IEEE Transactions on Systems, Man and Cybernetics. 1976. Vol. 7. P. 473-479 DOI: 10.1109/tsmc.1976.4309532
- Witczak M, Korbicz J, Mrugalski M., et al. A GMDH Neural Network-BasedApproach to Robust Fault Diagnosis:Application to the DAMADICS BenchmarkProblem//Control Engineering Practice. 2006. Vol. 14, No. 6. P. 671-683 DOI: 10.1016/j.conengprac.2005.04.007
- Kondo T., Ueno J. Multi-Layered GMDH-type Neural Network Self-Selecting OptimumNeural Network Architecture and Its Application to 3-Dimensional Medical Image Recognition of Blood Vessels//International Journal of Innovative Computing, Information and Control. 2008. Vol. 4, No. 1. P. 175-187.
- Linnainmaa S. The Representation of the Cumulative Rounding Error of an Algorithm as aTaylor Expansion of the Local Rounding Errors. University of Helsinki. 1970.
- Linnainmaa S. Taylor Expansion of the Accumulated Rounding Error//BIT NumericalMathematics. 1976. Vol. 16, No. 2. P. 146-160 DOI: 10.1007/bf01931367
- Werbos P.J. Applications of Advances in Nonlinear Sensitivity Analysis//Lecture Notes inControl and Information Sciences. 1981. Vol. 38, P. 762-770 DOI: 10.1007/BFb0006203
- Parker D.B. Learning Logic. Technical Report TR-47, Center for Computational Researchin Economics and Management Science, Massachusetts Institute of Technology, Cambridge, MA. 1985.
- LeCun Y. A Theoretical Framework for Back-Propagation//Proceedings of the 1988Connectionist Models Summer School (Pittsburgh, Pennsylvania, USA, June 17-26, 1988), 1988. P. 21-28.
- Rumelhart D.E., Hinton G.E., Williams R.J. Learning Internal Representations byError Propagation//Parallel Distributed Processing. 1986. Vol. 1. P. 318-362 DOI: 10.1016/b978-1-4832-1446-7.50035-2
- Qian N. On the Momentum Term in Gradient Descent Learning Algorithms//NeuralNetworks: The Official Journal of the International Neural Network Society. 1999. Vol. 12, No. 1. P. 145-151 DOI: 10.1016/s0893-6080(98)00116-6
- Duchi J., Hazan E., Singer Y. Adaptive Subgradient Methods for Online Learningand Stochastic Optimization//Journal of Machine Learning Research. 2011. Vol. 12. P. 2121-2159.
- Kingma D.P., Ba J.L. Adam: a Method for Stochastic Optimization//InternationalConference on Learning Representations (San Diego, USA, May 7-9, 2015), 2015. P. 1-13.
- Fukushima K. Neocognitron: a Self-Organizing Neural Network Model for a Mechanism ofPattern Recognition Unaffected by Shift in Position//Biological Cybernetics. 1980. Vol. 36, No. 4. P. 193-202 DOI: 10.1007/BF00344251
- Wiesel D.H., Hubel T.N. Receptive Fields of Single Neurones in the Cat’sStriate Cortex//The Journal of Physiology. 1959. Vol. 148, No. 3. P. 574-591 DOI: 10.1113/jphysiol.1959.sp006308
- Fukushima K. Artificial Vision by Multi-Layered Neural Networks: Neocognitron and itsAdvances//Neural Networks. 2013. Vol. 37. P. 103-119 DOI: 10.1016/j.neunet.2012.09.016
- Fukushima K. Training Multi-Layered Neural Network Neocognitron//Neural Networks.2013. Vol. 40. P. 18-31 DOI: 10.1016/j.neunet.2013.01.001
- Fukushima K. Increasing Robustness Against Background Noise:Visual PatternRecognition by a Neocognitron//Neural Networks. 2011. Vol. 24, No. 7. P. 767-778 DOI: 10.1016/j.neunet.2011.03.017
- Ballard D.H. Modular Learning in Neural Networks//Proceedings of the Sixth NationalConference on Artificial Intelligence (Seattle, Washington, USA, July 13-17, 1987), 1987. Vol. 1. P. 279-284.
- Hinton G.E., McClelland J.L. Learning Representations by Recirculation//NeuralInformation Processing Systems. 1998. American Institute of Physics. P. 358-366.
- Wolpert D.H. Stacked Generalization//Neural Networks. 1992. Vol. 5, No. 2. P. 241-259 DOI: 10.1016/s0893-6080(05)80023-1
- Ting K.M., Witten I.H. Stacked Generalization: When Does It Work?//Proceedings ofthe International Joint Conference on Artificial Intelligence (IJCAI) (Nagoya, Japan, August 23-29, 1997), 1997. P. 866-871.
- LeCun Y., Boser B., Denker J.S., et al. Back-Propagation Applied to HandwrittenZip Code Recognition//Neural Computation. 1998. Vol. 1, No. 4. P. 541-551 DOI: 10.1162/neco.1989.1.4.541
- LeCun Y., Boser B., Denker J.S., et al. Handwritten Digit Recognition with aBack-Propagation Network//Advances in Neural Information Processing Systems 2. Morgan Kaufmann. 1990. P. 396-404.
- Baldi P., Chauvin Y. Neural Networks for Fingerprint Recognition//Neural Computation.1993. Vol. 5, No. 3. P. 402-418 DOI: 10.1007/978-3-642-76153-9_35
- Elman J.L. Finding Structure in Time//Cognitive Science. 1990. Vol. 14, No. 2. P. 179-211 DOI: 10.1207/s15516709cog1402_1
- Jordan M.I. Serial Order: a Parallel Distributed Processing Approach. Institute for CognitiveScience, University of California, San Diego. ICS Report 8604. 1986. P. 40.
- Jordan M.I. Serial Order: a Parallel Distributed Processing Approach//Advances inPsychology. 1997. Vol. 121. P. 471-495 DOI: 10.1016/s0166-4115(97)80111-2
- Hochreiter S. Untersuchungen zu Dynamischen Neuronalen Netzen. Diploma thesis, Institutf¨ur Informatik, Lehrstuhl Prof. Brauer. Technische Universit¨at M¨unchen, 1991.
- Hochreiter S., Bengio Y., Frasconi P., et al. Gradient Flow in Recurrent Nets: the Difficultyof Learning Long-Term Dependencies//A Field Guide to Dynamical Recurrent Neural Networks. Wiley-IEEE Press. 2001. P. 237-243 DOI: 10.1109/9780470544037.ch14
- Bengio Y., Simard P., Frasconi P. Learning Long-Term Dependencies with Gradient Descentis Difficult//IEEE Transactions on Neural Networks. 1994. Vol. 5, No. 2. P. 157-166 DOI: 10.1109/72.279181
- Tiˇno P., Hammer B. Architectural Bias in Recurrent Neural Networks: Fractal Analysis//Neural Computation. 2004. Vol. 15, No. 8. P. 1931-1957 DOI: 10.1162/08997660360675099
- Hochreiter S., Schmidhuber J. Bridging Long Time Lags by Weight Guessing and "LongShort-Term Memory"//Spatiotemporal Models in Biological and Artificial Systems. 1996. Vol. 37. P. 65-72.
- Schmidhuber J., Wierstra D., Gagliolo M., et al. Training Recurrent Networks by Evolino.//Neural Computation. 2007. Vol. 19, No. 3. P. 757-779 DOI: 10.1162/neco.2007.19.3.757
- Levin L.A. Universal Sequential Search Problems//Problems of Information Transmission.1997. Vol. 9, No. 3. P. 265-266.
- Schmidhuber J. Discovering Neural Nets with Low Kolmogorov Complexity and HighGeneralization Capability//Neural Networks. 1997. Vol. 10, No. 5. P. 857-873 DOI: 10.1016/s0893-6080(96)00127-x
- M{\e}ller, M.F. Exact Calculation of the Product of the Hessian Matrix of Feed-ForwardNetwork Error Functions and a Vector in O(N) Time. Computer Science Department, Aarhus University, Denmark. 1993. No. PB-432 DOI: 10.7146/dpb.v22i432.6748
- Pearlmutter B.A. Fast Exact Multiplication by the Hessian//Neural Computation. 1994.Vol. 6, No. 1. P. 147-160 DOI: 10.1162/neco.1994.6.1.147
- Schraudolph N.N.FastCurvatureMatrix-VectorProductsforSecond-OrderGradient Descent//Neural Computation. 2002. Vol. 14, No. 7. P. 1723-1738 DOI: 10.1162/08997660260028683
- Martens J. Deep Learning via Hessian-Free Optimization//Proceedings of the 27thInternational Conference on Machine Learning (ICML-10) (Haifa, Israel, June 21-24, 2010), 2010. P. 735-742.
- Martens J., Sutskever I. Learning Recurrent Neural Networks with Hessian-FreeOptimization//Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Bellevue, Washington, USA, June 28 -July 02, 2011), 2011. P. 1033-1040.
- Schmidhuber J. Learning Complex, Extended Sequences Using the Principle ofHistory Compression//Neural Computation. 1992. Vol. 4, No. 2. P. 234-242 DOI: 10.1162/neco.1992.4.2.234
- Connor J., Martin D.R., Atlas L.E. Recurrent Neural Networks and Robust Time SeriesPrediction//IEEE Transactions on Neural Networks. 1994. Vol. 5, No. 2. P. 240-254 DOI: 10.1109/72.279188
- Dorffner G. Neural Networks for Time Series Processing//Neural Network World. 1996.Vol. 6. P. 447-468.
- Schmidhuber J., Mozer M.C., Prelinger D. Continuous History Compression//Proceedingsof International Workshop on Neural Networks (Aachen, Germany, 1993), 1993. P. 87-95.
- Hochreiter S., Schmidhuber J. Long Short-Term Memory//Neural Computation. 1997.Vol. 9, No. 8. P. 1735-1780 DOI: 10.1162/neco.1997.9.8.1735
- Gers F.A., Schmidhuber J., Cummins F. Learning to Forget: Continual Predictionwith LSTM//Neural Computation. 2000. Vol. 12, No. 10. P. 2451-2471 DOI: 10.1162/089976600300015015
- P´erez-Ortiz J.A., Gers F.A., Eck D., et al. Kalman Filters Improve LSTM NetworkPerformance in Problems Unsolvable by Traditional Recurrent Nets//Neural Networks. 2003. Vol. 16, No. 2. P. 241-250 DOI: 10.1016/s0893-6080(02)00219-8
- Weng J., Ahuja N., Huang T.S. Cresceptron: a Self-Organizing Neural Network Which GrowsAdaptively//International Joint Conference on Neural Networks (IJCNN) (Baltimore, MD, USA, 7-11 June 1992). 1992. Vol. 1. P. 576-581 DOI: 10.1109/ijcnn.1992.287150
- Weng J.J., Ahuja N., Huang T.S. Learning Recognition and Segmentation Using theCresceptron//International Journal of Computer Vision. 1997. Vol. 25, No. 2. P. 109-143 DOI: 10.1023/a:1007967800668
- Ranzato M.A., Huang F.J., Boureau Y.L., et al. Unsupervised Learning of Invariant FeatureHierarchies with Applications to Object Recognition//IEEE Conference on Computer Vision and Pattern Recognition (Minneapolis, MN, USA, 17-22 June 2007), 2007. P. 1-8 DOI: 10.1109/cvpr.2007.383157
- Scherer D., M¨uller A., Behnke S. Evaluation of Pooling Operations in ConvolutionalArchitectures for Object Recognition//Lecture Notes in Computer Science. 2010. Vol. 6354, P. 92-101 DOI: 10.1007/978-3-642-15825-4_10
- Smolensky P. Information Processing in Dynamical Systems: Foundations of HarmonyTheory//Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. Vol. 1. P. 194-281.
- Hinton G.E., Sejnowski T.E. Learning and Relearning in Boltzmann Machines//ParallelDistributed Processing. 1986. Vol. 1. P. 282-317.
- Memisevic R., Hinton G.E. Learning to Represent Spatial Transformations with FactoredHigher-Order Boltzmann Machines//Neural Computation. 2010. Vol. 22, No. 6. P. 1473-1492 DOI: 10.1162/neco.2010.01-09-953
- Mohamed A., Hinton G.E. Phone Recognition Using Restricted Boltzmann Machines//IEEE International Conference on Acoustics, Speech and Signal Processing (Dallas, TX, USA, 14-19 March 2010), 2010. P. 4354-4357 DOI: 10.1109/icassp.2010.5495651
- Salakhutdinov R., Hinton G. Semantic Hashing//International Journal of ApproximateReasoning. 2009. Vol. 50, No. 7. P. 969-978 DOI: 10.1016/j.ijar.2008.11.006
- Bengio Y., Lamblin P., Popovici D., et al. Greedy Layer-Wise Training of Deep Networks//Advances in Neural Information Processing Systems 19. 2007. P. 153-160.
- Vincent P., Hugo L., Bengio Y., et al. Extracting and Composing Robust Featureswith Denoising Autoencoders//Proceedings of the 25th international Conference on Machine learning (Helsinki, Finland, July 05-09, 2008). 2008. P. 1096-1103 DOI: 10.1145/1390156.1390294
- Erhan D., Bengio Y., Courville A., et al. Why Does Unsupervised Pre-Training Help DeepLearning?//Journal of Machine Learning Research. 2010. Vol. 11. P. 625-660.
- Arel I., Rose D.C., Karnowski T.P. Deep Machine Learning -a New Frontier in ArtificialIntelligence Research//Computational Intelligence Magazine, IEEE. 2010. Vol. 5, No. 4. P. 13-18 DOI: 10.1109/mci.2010.938364
- Viren J., Sebastian S. Natural Image Denoising with Convolutional Networks//Advancesin Neural Information Processing Systems (NIPS) 21. 2009. P. 769-776.
- Razavian A.Sh., Azizpour H., Sullivan J., at al. CNN Features Off-the-Shelf: An AstoundingBaseline for Recognition//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (Washington, DC, USA, June 23-28, 2014), 2014. P. 512-519 DOI: 10.1109/cvprw.2014.131
- Ruochen W., Zhe X. A Pedestrian and Vehicle Rapid Identification Model Based onConvolutional Neural Network//Proceedings of the 7th International Conference on Internet Multimedia Computing and Service (ICIMCS ’15) (Zhangjiajie, China, August 19-21, 2015), 2015. P. 32:1-32:4 DOI: 10.1145/2808492.2808524
- Boominathan L., Kruthiventi S.S., Babu R.V. CrowdNet:A Deep ConvolutionalNetwork for Dense Crowd Counting//Proceedings of the 2016 ACM on Multimedia Conference (Amsterdam, The Netherlands, October 15-19, 2016), 2016. P. 640-644 DOI: 10.1145/2964284.2967300
- Kinnikar A., Husain M., Meena S.M. Face Recognition Using Gabor Filter AndConvolutional Neural Network//Proceedings of the International Conference on Informatics and Analytics (Pondicherry, India, August 25-26, 2016), 2016. P. 113:1-113:4 DOI: 10.1145/2980258.2982104
- Hahnloser R.H.R., Sarpeshkar R., Mahowald M.A., et al. Digital Selection and AnalogueAmplification Coexist in a Cortex-Inspired Silicon Circuit//Nature. 2000. Vol. 405. P. 947-951 DOI: 10.1038/35016072
- Hahnloser R.H.R., Seung H.S., Slotine J.J. Permitted and Forbidden Sets in SymmetricThreshold-Linear Networks//Neural Computation. 2003. Vol. 15, No. 3. P. 621-638 DOI: 10.1162/089976603321192103
- Glorot X., Bordes A., Bengio Y. Deep Sparse Rectifier Neural Networks//Journal ofMachine Learning Research. 2011. Vol. 15. P. 315-323.
- Glorot X., Bengio Y. Understanding the Difficulty of Training Deep Feedforward NeuralNetworks//Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10) (Sardinia, Italy, May 13-15, 2010). Society for Artificial Intelligence and Statistics. 2010. P. 249-256.
- He K., Zhang X., Ren Sh. et al. Delving Deep into Rectifiers: Surpassing Human-LevelPerformance on ImageNet Classification//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (Santiago, Chile, December 7-13, 2015), 2015. P. 1026-1034 DOI: 10.1109/ICCV.2015.123
- Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift//JMLR Workshop and Conference Proceedings. Proceedings of the 32nd International Conference on Machine Learning (Lille, France, July 06-11, 2015), 2015. Vol. 37. P. 448-456.
- Szegedy C., Liu W, Jia Y. et al. Going Deeper with Convolutions//IEEE Conferenceon Computer Vision and Pattern Recognition (Boston, MA, USA, June 7-12, 2015), 2015. P. 1-9 DOI: 10.1109/CVPR.2015.7298594
- Szegedy C., Vanhoucke V., Ioffe S., et al. Rethinking the Inception Architecturefor Computer Vision//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Seattle, WA, USA, Jun 27-30, 2016), 2016. P. 2818-2826 DOI: 10.1109/cvpr.2016.308
- Szegedy C., Ioffe S., Vanhoucke V., et al. Inception-v4, Inception-ResNet and the Impactof Residual Connections on Learning//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) (San Francisco, California, USA, February 4-9, 2017), 2017. P. 4278-4284.
- Cho K., van Merrienboer B., Gulcehre C., et al. Learning Phrase Representations UsingRNN Encoder-Decoder for Statistical Machine Translation//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha, Qatar, October 25-29, 2014), 2014. P. 1724-1734 DOI: 10.3115/v1/d14-1179
- Cho K., van Merrienboer B., Bahdanau D., et al. On the Properties of Neural MachineTranslation: Encoder-Decoder Approaches//Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (Doha, Qatar, October 25, 2014), 2014. P. 103-111 DOI: 10.3115/v1/w14-4012
- Chung, J., Gulcehre, C., Cho, K., et al. Empirical Evaluation of Gated Recurrent NeuralNetworks on Sequence Modeling//NIPS 2014 Workshop on Deep Learning (Montreal, Canada, December 12, 2014), 2014. P. 1-9.
- He K., Sun J. Convolutional Neural Networks at Constrained Time Cost//2015 IEEEConference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, June 07-12, 2015), 2015. P. 5353-5360 DOI: 10.1109/CVPR.2015.7299173
- Jia Y., Shelhamer E., Donahue J., et al. Caffe: Convolutional Architecture for FastFeature Embedding//Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL, USA, November 03-07, 2014), 2014. P. 675-678 DOI: 10.1145/2647868.2654889
- Kruchinin D., Dolotov E., Kornyakov K. et al. Comparison of Deep Learning Libraries onthe Problem of Handwritten Digit Classification//Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science. 2015. Vol. 542. P. 399-411 DOI: 10.1007/978-3-319-26123-2_38
- Bahrampour S., Ramakrishnan N., Schott L., et al. Comparative Study of Deep LearningSoftware Frameworks. URL: https://arxiv.org/abs/1511.06435 (дата обращения: 02.07.2017).
- Bergstra J., Breuleux O., Bastien F., et al. Theano: a CPU and GPU Math ExpressionCompiler//Proceedings of the Python for Scientific Computing Conference (SciPy) (Austin, TX, USA, June 28 -July 3, 2010), 2010. P. 3-10.
- Abadi M., Agarwal A. Barham P. TensorFlow: Large-Scale Machine Learning onHeterogeneous Distributed Systems//Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) (Savannah, GA, USA, November, 2-4, 2016), 2016. P. 265-283.
- Collobert R., Kavukcuoglu K., Farabet C. Torch7: a Matlab-like Environment for MachineLearning//BigLearn, NIPS Workshop (Granada, Spain, December 12-17, 2011), 2011.
- Seide F., Agarwal A. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) (San Francisco, California, USA, August 13-17, 2016), 2016. P. 2135-2135 DOI: 10.1145/2939672.2945397
- Viebke A., Pllana S. The Potential of the Intel(r) Xeon Phi for Supervised DeepLearning//IEEE 17th International Conference on High Performance Computing and Communications (HPCC) (New York, USA, August 24-26, 2015), 2015. P. 758-765 DOI: 10.1109/hpcc-css-icess.2015.45
- Chollet. F., et al. Keras. 2015. URL: https://github.com/fchollet/keras (датаобращения: 02.07.2017).
- PadlePadle: PArallel Distributed Deep LEarning. URL: http://www.paddlepaddle.org/(дата обращения: 02.07.2017).
- Chen T., Li M., Li Y. MXNet: A Flexible and Efficient Machine Learning Libraryfor Heterogeneous Distributed Systems. URL: https://arxiv.org/abs/1512.01274 (дата обращения: 02.07.2017).
- Intel Nervana Reference Deep Learning Framework Committed to Best Performance on allHardware. URL: https://www.intelnervana.com/neon/(дата обращения: 02.07.2017).
- Shi Sh., Wang Q., Xu P. Benchmarking State-of-the-Art Deep Learning Software Tools.URL: https://arxiv.org/abs/1608.07249 (дата обращения: 02.07.2017).
- Weiss K., Khoshgoftaar T.M., Wang D. A Survey of Transfer Learning//Journal of BigData. 2016. Vol. 3, No. 1. P. 1-9 DOI: 10.1186/s40537-016-0043-6
- Ba J., Mnih V., Kavukcuoglu K. Multiple Object Recognition with Visual Attention//Proceedings of the International Conference on Learning Representations (ICLR) (San Diego, USA,May 7-9, 2015), 2015. P. 1-10.
- Graves A., Mohamed A.R., Hinton G. Speech Recognition with Deep RecurrentNeural Networks//IEEE International Conference on Acoustics, Speech and Signal Processing (Vancouver, Canada, May 26-31, 2013), 2013. P. 6645-6649 DOI: 10.1109/ICASSP.2013.6638947