Обзор методов обучения глубоких нейронных сетей

Автор: Созыкин Андрей Владимирович

Журнал: Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика @vestnik-susu-cmi

Рубрика: Информатика, вычислительная техника и управление

Статья в выпуске: 3 т.6, 2017 года.

Бесплатный доступ

Глубокие нейронные сети в настоящее время становятся одним из самых популярных подходов к созданию систем искусственного интеллекта, таких как распознавание речи, обработка естественного языка, компьютерное зрение и т.п. В статье представлен обзор истории развития и современного состояния методов обучению глубоких нейронных сетей. Рассматривается модель искусственной нейронной сети, алгоритмы обучения нейронных сетей, в том числе алгоритм обратного распространения ошибки, применяемый для обучения глубоких нейронных сетей. Описывается развитие архитектур нейронных сетей: неокогнитрон,автокодировщики, сверточные нейронные сети, ограниченная машина Больцмана, глубокие сети доверия,сети долго-краткосрочной памяти, управляемые рекуррентные нейронные сети и сети остаточного обучения.Глубокие нейронные сети с большим количеством скрытых слоев трудно обучать из-за проблемы исчезающего градиента. В статье рассматриваются методы решения этой проблемы, которые позволяют успешно обучать глубокие нейронные сети с более чем ста слоями. Приводится обзор популярных библиотек глубокого обучения нейронных сетей, которые сделали возможным широкое практическое применение данной технологии. В настоящее время для задач компьютерного зрения используются сверточные нейронные сети, а для обработки последовательностей, в том числе естественного языка, - рекуррентные нейронные сети, прежде всего сети долго-краткосрочной памяти и управляемые рекуррентные нейронные сети.

Еще

Глубокое обучение, нейронные сети, машинное обучение

Короткий адрес: https://sciup.org/147160624

IDR: 147160624 | DOI: 10.14529/cmse170303

Список литературы Обзор методов обучения глубоких нейронных сетей

LeCun Y., Bengio Y., Hinton G. Deep Learning//Nature. 2015. Vol. 521. P. 436-444 DOI: 10.1038/nature14539
Rav`ı D., Wong Ch., Deligianni F., et al. Deep Learning for Health Informatics//IEEE Journal of Biomedical and Health Informatics. 2017. Vol. 21, No. 1. P. 4-21 DOI: 10.1109/JBHI.2016.2636665
Schmidhuber J. Deep Learning in Neural Networks: an Overview//Neural Networks. 2015.Vol. 1. P. 85-117, DOI: 10.1016/j.neunet.2014.09.003
McCulloch W.S., Pitts W. A Logical Calculus of the Ideas Immanent in NervousActivity//The Bulletin of Mathematical Biophysics. 1943. Vol. 5, No. 4. P. 115-133 DOI: 10.1007/BF02478259
Hinton G., Salakhutdinov R. Reducing the Dimensionality of Data with Neural Networks//Science. 2006. Vol. 313, No. 5786. P. 504-507 DOI: 10.1126/science.1127647
Hinton G.E., Osindero S., Teh Y.-W. A Fast Learning Algorithm for Deep Belief Nets//Neural Computing. 2006. Vol. 18, No. 7. P. 1527-1554 DOI: 10.1162/neco.2006.18.7.1527
S´ıma J. Loading Deep Networks Is Hard//Neural Computation. 1994. Vol. 6, No. 5.P. 842-850 DOI: 10.1162/neco.1994.6.5.842
Windisch D. Loading Deep Networks Is Hard: The Pyramidal Case//Neural Computation.2005. Vol. 17, No. 2. P. 487-502 DOI: 10.1162/0899766053011519
Gomez F.J., Schmidhuber J. Co-Evolving Recurrent Neurons Learn Deep Memory POMDPs//Proc. of the 2005 Conference on Genetic and Evolutionary Computation (GECCO) (Washington, DC, USA, June 25-29, 2005), 2005. P. 491-498 DOI: 10.1145/1068009.1068092
Ciresan D.C., Meier U., Gambardella L.M., Schmidhuber J. Deep, Big, Simple NeuralNets for Handwritten Digit Recognition//Neural Computation. 2010. Vol. 22, No. 12. P. 3207-3220 DOI: 10.1162/NECO_a_00052
He K., Zhang X., Ren S., et al. Deep Residual Learning for Image Recognition//2016 IEEEConference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA, 27-30 June 2016), 2016. P. 770-778 DOI: 10.1109/CVPR.2016.90
Rumelhart D.E., Hinton G.E., McClelland J.L. A General Framework for Parallel DistributedProcessing//Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. Vol. 1, P. 45-76 DOI: 10.1016/B978-1-4832-1446-7.50010-8
LeCun Y., Bottou L., Orr G.B. Efficient BackProp//Neural Networks: Tricks of the Trade.1998. P. 9-50 DOI: 10.1007/3-540-49430-8_2
Broomhead D.S., Lowe D. Multivariable Functional Interpolation and Adaptive Networks//Complex Systems. Vol. 2. P. 321-355 DOI: 10.1016/0167-6911(92)90025-N
Stone M.N. The Generalized Weierstrass Approximation Theorem//Mathematics Magazine.1948. Vol. 21, No. 4. P. 167-184 DOI: 10.2307/3029750
Горбань А.Н., Дунин-Барковский В.Л., Кирдин А.Н. и др. Нейроинформатика.Новосибирск: Наука. 1998. C. 296.
Hornik K., Stinchcombe M., White H. Multilayer Feedforward Networks areUniversal Approximators//Neural Networks. 1989. Vol. 2, No. 5. P. 359-366 DOI: 10.1016/0893-6080(89)90020-8
Mhaskar H.N., Micchelli Ch.A. Approximation by Superposition of Sigmoidal and RadialBasis Functions//Advances in Applied Mathematics. 1992. Vol. 13, No. 13. P. 350-373 DOI: 10.1016/0196-8858(92)90016-P
Hebb D.O. The Organization of Behavior. New York:Wiley. 1949. 335 p DOI: 10.1016/S0361-9230(99)00182-3
Novikoff A.B. On Convergence Proofs on Perceptrons//Symposium on the MathematicalTheory of Automata. 1962. Vol. 12. P. 615-622.
Rosenblatt F. The Perceptron: a Probabilistic Model for Information Storage and Organization in the Brain//Psychological Review. 1958. P. 65-386 DOI: 10.1037/h0042519
Widrow B., Hoff M. Associative Storage and Retrieval of Digital Information in Networksof Adaptive Neurons//Biological Prototypes and Synthetic Systems. 1962. Vol. 1. 160 p DOI: 10.1007/978-1-4684-1716-6_25
Narendra K.S., Thathatchar M.A.L. Learning Automata -a Survey//IEEE Transactions onSystems, Man, and Cybernetics. 1974. Vol. 4. P. 323-334 DOI: 10.1109/tsmc.1974.5408453
Rosenblatt F. Principles of Neurodynamics; Perceptrons and the Theory of BrainMechanisms. 1962. Washington: Spartan Books. 616 p DOI: 10.1007/978-3-642-70911-1_20
Grossberg S. Some Networks That Can Learn, Remember, and Reproduce any Number ofComplicated Space-Time Patterns//International Journal of Mathematics and Mechanics. 1969. Vol. 19. P. 53-91 DOI: 10.1512/iumj.1970.19.19007
Kohonen T. Correlation Matrix Memories//IEEE Transactions on Computers. 1972.Vol. 100, No. 4. P. 353-359 DOI: 10.1109/tc.1972.5008975
von der Malsburg C. Self-Organization of Orientation Sensitive Cells in the Striate Cortex//Kybernetik. 1973. Vol. 14, No. 2. P. 85-100 DOI: 10.1007/bf00288907
Willshaw D.J., von der Malsburg C. How Patterned Neural Connections Can Be Set Up bySelf-Organization//Proceedings of the Royal Society London B. 1976. Vol. 194. P. 431-445 DOI: 10.1098/rspb.1976.0087
Ivakhnenko A.G. Heuristic Self-Organization in Problems of Engineering Cybernetics//Automatica. 1970. Vol. 6, No. 2. P. 207-219 DOI: 10.1016/0005-1098(70)90092-0
Ivakhnenko A.G. Polynomial Theory of Complex Systems//IEEE Transactions on Systems,Man and Cybernetics. 1971. Vol. 4. P. 364-378 DOI: 10.1109/tsmc.1971.4308320
Ikeda S., Ochiai M., Sawaragi Y. Sequential GMDH Algorithm and Its Application to RiverFlow Prediction//IEEE Transactions on Systems, Man and Cybernetics. 1976. Vol. 7. P. 473-479 DOI: 10.1109/tsmc.1976.4309532
Witczak M, Korbicz J, Mrugalski M., et al. A GMDH Neural Network-BasedApproach to Robust Fault Diagnosis:Application to the DAMADICS BenchmarkProblem//Control Engineering Practice. 2006. Vol. 14, No. 6. P. 671-683 DOI: 10.1016/j.conengprac.2005.04.007
Kondo T., Ueno J. Multi-Layered GMDH-type Neural Network Self-Selecting OptimumNeural Network Architecture and Its Application to 3-Dimensional Medical Image Recognition of Blood Vessels//International Journal of Innovative Computing, Information and Control. 2008. Vol. 4, No. 1. P. 175-187.
Linnainmaa S. The Representation of the Cumulative Rounding Error of an Algorithm as aTaylor Expansion of the Local Rounding Errors. University of Helsinki. 1970.
Linnainmaa S. Taylor Expansion of the Accumulated Rounding Error//BIT NumericalMathematics. 1976. Vol. 16, No. 2. P. 146-160 DOI: 10.1007/bf01931367
Werbos P.J. Applications of Advances in Nonlinear Sensitivity Analysis//Lecture Notes inControl and Information Sciences. 1981. Vol. 38, P. 762-770 DOI: 10.1007/BFb0006203
Parker D.B. Learning Logic. Technical Report TR-47, Center for Computational Researchin Economics and Management Science, Massachusetts Institute of Technology, Cambridge, MA. 1985.
LeCun Y. A Theoretical Framework for Back-Propagation//Proceedings of the 1988Connectionist Models Summer School (Pittsburgh, Pennsylvania, USA, June 17-26, 1988), 1988. P. 21-28.
Rumelhart D.E., Hinton G.E., Williams R.J. Learning Internal Representations byError Propagation//Parallel Distributed Processing. 1986. Vol. 1. P. 318-362 DOI: 10.1016/b978-1-4832-1446-7.50035-2
Qian N. On the Momentum Term in Gradient Descent Learning Algorithms//NeuralNetworks: The Official Journal of the International Neural Network Society. 1999. Vol. 12, No. 1. P. 145-151 DOI: 10.1016/s0893-6080(98)00116-6
Duchi J., Hazan E., Singer Y. Adaptive Subgradient Methods for Online Learningand Stochastic Optimization//Journal of Machine Learning Research. 2011. Vol. 12. P. 2121-2159.
Kingma D.P., Ba J.L. Adam: a Method for Stochastic Optimization//InternationalConference on Learning Representations (San Diego, USA, May 7-9, 2015), 2015. P. 1-13.
Fukushima K. Neocognitron: a Self-Organizing Neural Network Model for a Mechanism ofPattern Recognition Unaffected by Shift in Position//Biological Cybernetics. 1980. Vol. 36, No. 4. P. 193-202 DOI: 10.1007/BF00344251
Wiesel D.H., Hubel T.N. Receptive Fields of Single Neurones in the Cat’sStriate Cortex//The Journal of Physiology. 1959. Vol. 148, No. 3. P. 574-591 DOI: 10.1113/jphysiol.1959.sp006308
Fukushima K. Artificial Vision by Multi-Layered Neural Networks: Neocognitron and itsAdvances//Neural Networks. 2013. Vol. 37. P. 103-119 DOI: 10.1016/j.neunet.2012.09.016
Fukushima K. Training Multi-Layered Neural Network Neocognitron//Neural Networks.2013. Vol. 40. P. 18-31 DOI: 10.1016/j.neunet.2013.01.001
Fukushima K. Increasing Robustness Against Background Noise:Visual PatternRecognition by a Neocognitron//Neural Networks. 2011. Vol. 24, No. 7. P. 767-778 DOI: 10.1016/j.neunet.2011.03.017
Ballard D.H. Modular Learning in Neural Networks//Proceedings of the Sixth NationalConference on Artificial Intelligence (Seattle, Washington, USA, July 13-17, 1987), 1987. Vol. 1. P. 279-284.
Hinton G.E., McClelland J.L. Learning Representations by Recirculation//NeuralInformation Processing Systems. 1998. American Institute of Physics. P. 358-366.
Wolpert D.H. Stacked Generalization//Neural Networks. 1992. Vol. 5, No. 2. P. 241-259 DOI: 10.1016/s0893-6080(05)80023-1
Ting K.M., Witten I.H. Stacked Generalization: When Does It Work?//Proceedings ofthe International Joint Conference on Artificial Intelligence (IJCAI) (Nagoya, Japan, August 23-29, 1997), 1997. P. 866-871.
LeCun Y., Boser B., Denker J.S., et al. Back-Propagation Applied to HandwrittenZip Code Recognition//Neural Computation. 1998. Vol. 1, No. 4. P. 541-551 DOI: 10.1162/neco.1989.1.4.541
LeCun Y., Boser B., Denker J.S., et al. Handwritten Digit Recognition with aBack-Propagation Network//Advances in Neural Information Processing Systems 2. Morgan Kaufmann. 1990. P. 396-404.
Baldi P., Chauvin Y. Neural Networks for Fingerprint Recognition//Neural Computation.1993. Vol. 5, No. 3. P. 402-418 DOI: 10.1007/978-3-642-76153-9_35
Elman J.L. Finding Structure in Time//Cognitive Science. 1990. Vol. 14, No. 2. P. 179-211 DOI: 10.1207/s15516709cog1402_1
Jordan M.I. Serial Order: a Parallel Distributed Processing Approach. Institute for CognitiveScience, University of California, San Diego. ICS Report 8604. 1986. P. 40.
Jordan M.I. Serial Order: a Parallel Distributed Processing Approach//Advances inPsychology. 1997. Vol. 121. P. 471-495 DOI: 10.1016/s0166-4115(97)80111-2
Hochreiter S. Untersuchungen zu Dynamischen Neuronalen Netzen. Diploma thesis, Institutf¨ur Informatik, Lehrstuhl Prof. Brauer. Technische Universit¨at M¨unchen, 1991.
Hochreiter S., Bengio Y., Frasconi P., et al. Gradient Flow in Recurrent Nets: the Difficultyof Learning Long-Term Dependencies//A Field Guide to Dynamical Recurrent Neural Networks. Wiley-IEEE Press. 2001. P. 237-243 DOI: 10.1109/9780470544037.ch14
Bengio Y., Simard P., Frasconi P. Learning Long-Term Dependencies with Gradient Descentis Difficult//IEEE Transactions on Neural Networks. 1994. Vol. 5, No. 2. P. 157-166 DOI: 10.1109/72.279181
Tiˇno P., Hammer B. Architectural Bias in Recurrent Neural Networks: Fractal Analysis//Neural Computation. 2004. Vol. 15, No. 8. P. 1931-1957 DOI: 10.1162/08997660360675099
Hochreiter S., Schmidhuber J. Bridging Long Time Lags by Weight Guessing and "LongShort-Term Memory"//Spatiotemporal Models in Biological and Artificial Systems. 1996. Vol. 37. P. 65-72.
Schmidhuber J., Wierstra D., Gagliolo M., et al. Training Recurrent Networks by Evolino.//Neural Computation. 2007. Vol. 19, No. 3. P. 757-779 DOI: 10.1162/neco.2007.19.3.757
Levin L.A. Universal Sequential Search Problems//Problems of Information Transmission.1997. Vol. 9, No. 3. P. 265-266.
Schmidhuber J. Discovering Neural Nets with Low Kolmogorov Complexity and HighGeneralization Capability//Neural Networks. 1997. Vol. 10, No. 5. P. 857-873 DOI: 10.1016/s0893-6080(96)00127-x
M{\e}ller, M.F. Exact Calculation of the Product of the Hessian Matrix of Feed-ForwardNetwork Error Functions and a Vector in O(N) Time. Computer Science Department, Aarhus University, Denmark. 1993. No. PB-432 DOI: 10.7146/dpb.v22i432.6748
Pearlmutter B.A. Fast Exact Multiplication by the Hessian//Neural Computation. 1994.Vol. 6, No. 1. P. 147-160 DOI: 10.1162/neco.1994.6.1.147
Schraudolph N.N.FastCurvatureMatrix-VectorProductsforSecond-OrderGradient Descent//Neural Computation. 2002. Vol. 14, No. 7. P. 1723-1738 DOI: 10.1162/08997660260028683
Martens J. Deep Learning via Hessian-Free Optimization//Proceedings of the 27thInternational Conference on Machine Learning (ICML-10) (Haifa, Israel, June 21-24, 2010), 2010. P. 735-742.
Martens J., Sutskever I. Learning Recurrent Neural Networks with Hessian-FreeOptimization//Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Bellevue, Washington, USA, June 28 -July 02, 2011), 2011. P. 1033-1040.
Schmidhuber J. Learning Complex, Extended Sequences Using the Principle ofHistory Compression//Neural Computation. 1992. Vol. 4, No. 2. P. 234-242 DOI: 10.1162/neco.1992.4.2.234
Connor J., Martin D.R., Atlas L.E. Recurrent Neural Networks and Robust Time SeriesPrediction//IEEE Transactions on Neural Networks. 1994. Vol. 5, No. 2. P. 240-254 DOI: 10.1109/72.279188
Dorffner G. Neural Networks for Time Series Processing//Neural Network World. 1996.Vol. 6. P. 447-468.
Schmidhuber J., Mozer M.C., Prelinger D. Continuous History Compression//Proceedingsof International Workshop on Neural Networks (Aachen, Germany, 1993), 1993. P. 87-95.
Hochreiter S., Schmidhuber J. Long Short-Term Memory//Neural Computation. 1997.Vol. 9, No. 8. P. 1735-1780 DOI: 10.1162/neco.1997.9.8.1735
Gers F.A., Schmidhuber J., Cummins F. Learning to Forget: Continual Predictionwith LSTM//Neural Computation. 2000. Vol. 12, No. 10. P. 2451-2471 DOI: 10.1162/089976600300015015
P´erez-Ortiz J.A., Gers F.A., Eck D., et al. Kalman Filters Improve LSTM NetworkPerformance in Problems Unsolvable by Traditional Recurrent Nets//Neural Networks. 2003. Vol. 16, No. 2. P. 241-250 DOI: 10.1016/s0893-6080(02)00219-8
Weng J., Ahuja N., Huang T.S. Cresceptron: a Self-Organizing Neural Network Which GrowsAdaptively//International Joint Conference on Neural Networks (IJCNN) (Baltimore, MD, USA, 7-11 June 1992). 1992. Vol. 1. P. 576-581 DOI: 10.1109/ijcnn.1992.287150
Weng J.J., Ahuja N., Huang T.S. Learning Recognition and Segmentation Using theCresceptron//International Journal of Computer Vision. 1997. Vol. 25, No. 2. P. 109-143 DOI: 10.1023/a:1007967800668
Ranzato M.A., Huang F.J., Boureau Y.L., et al. Unsupervised Learning of Invariant FeatureHierarchies with Applications to Object Recognition//IEEE Conference on Computer Vision and Pattern Recognition (Minneapolis, MN, USA, 17-22 June 2007), 2007. P. 1-8 DOI: 10.1109/cvpr.2007.383157
Scherer D., M¨uller A., Behnke S. Evaluation of Pooling Operations in ConvolutionalArchitectures for Object Recognition//Lecture Notes in Computer Science. 2010. Vol. 6354, P. 92-101 DOI: 10.1007/978-3-642-15825-4_10
Smolensky P. Information Processing in Dynamical Systems: Foundations of HarmonyTheory//Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986. Vol. 1. P. 194-281.
Hinton G.E., Sejnowski T.E. Learning and Relearning in Boltzmann Machines//ParallelDistributed Processing. 1986. Vol. 1. P. 282-317.
Memisevic R., Hinton G.E. Learning to Represent Spatial Transformations with FactoredHigher-Order Boltzmann Machines//Neural Computation. 2010. Vol. 22, No. 6. P. 1473-1492 DOI: 10.1162/neco.2010.01-09-953
Mohamed A., Hinton G.E. Phone Recognition Using Restricted Boltzmann Machines//IEEE International Conference on Acoustics, Speech and Signal Processing (Dallas, TX, USA, 14-19 March 2010), 2010. P. 4354-4357 DOI: 10.1109/icassp.2010.5495651
Salakhutdinov R., Hinton G. Semantic Hashing//International Journal of ApproximateReasoning. 2009. Vol. 50, No. 7. P. 969-978 DOI: 10.1016/j.ijar.2008.11.006
Bengio Y., Lamblin P., Popovici D., et al. Greedy Layer-Wise Training of Deep Networks//Advances in Neural Information Processing Systems 19. 2007. P. 153-160.
Vincent P., Hugo L., Bengio Y., et al. Extracting and Composing Robust Featureswith Denoising Autoencoders//Proceedings of the 25th international Conference on Machine learning (Helsinki, Finland, July 05-09, 2008). 2008. P. 1096-1103 DOI: 10.1145/1390156.1390294
Erhan D., Bengio Y., Courville A., et al. Why Does Unsupervised Pre-Training Help DeepLearning?//Journal of Machine Learning Research. 2010. Vol. 11. P. 625-660.
Arel I., Rose D.C., Karnowski T.P. Deep Machine Learning -a New Frontier in ArtificialIntelligence Research//Computational Intelligence Magazine, IEEE. 2010. Vol. 5, No. 4. P. 13-18 DOI: 10.1109/mci.2010.938364
Viren J., Sebastian S. Natural Image Denoising with Convolutional Networks//Advancesin Neural Information Processing Systems (NIPS) 21. 2009. P. 769-776.
Razavian A.Sh., Azizpour H., Sullivan J., at al. CNN Features Off-the-Shelf: An AstoundingBaseline for Recognition//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (Washington, DC, USA, June 23-28, 2014), 2014. P. 512-519 DOI: 10.1109/cvprw.2014.131
Ruochen W., Zhe X. A Pedestrian and Vehicle Rapid Identification Model Based onConvolutional Neural Network//Proceedings of the 7th International Conference on Internet Multimedia Computing and Service (ICIMCS ’15) (Zhangjiajie, China, August 19-21, 2015), 2015. P. 32:1-32:4 DOI: 10.1145/2808492.2808524
Boominathan L., Kruthiventi S.S., Babu R.V. CrowdNet:A Deep ConvolutionalNetwork for Dense Crowd Counting//Proceedings of the 2016 ACM on Multimedia Conference (Amsterdam, The Netherlands, October 15-19, 2016), 2016. P. 640-644 DOI: 10.1145/2964284.2967300
Kinnikar A., Husain M., Meena S.M. Face Recognition Using Gabor Filter AndConvolutional Neural Network//Proceedings of the International Conference on Informatics and Analytics (Pondicherry, India, August 25-26, 2016), 2016. P. 113:1-113:4 DOI: 10.1145/2980258.2982104
Hahnloser R.H.R., Sarpeshkar R., Mahowald M.A., et al. Digital Selection and AnalogueAmplification Coexist in a Cortex-Inspired Silicon Circuit//Nature. 2000. Vol. 405. P. 947-951 DOI: 10.1038/35016072
Hahnloser R.H.R., Seung H.S., Slotine J.J. Permitted and Forbidden Sets in SymmetricThreshold-Linear Networks//Neural Computation. 2003. Vol. 15, No. 3. P. 621-638 DOI: 10.1162/089976603321192103
Glorot X., Bordes A., Bengio Y. Deep Sparse Rectifier Neural Networks//Journal ofMachine Learning Research. 2011. Vol. 15. P. 315-323.
Glorot X., Bengio Y. Understanding the Difficulty of Training Deep Feedforward NeuralNetworks//Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10) (Sardinia, Italy, May 13-15, 2010). Society for Artificial Intelligence and Statistics. 2010. P. 249-256.
He K., Zhang X., Ren Sh. et al. Delving Deep into Rectifiers: Surpassing Human-LevelPerformance on ImageNet Classification//Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV) (Santiago, Chile, December 7-13, 2015), 2015. P. 1026-1034 DOI: 10.1109/ICCV.2015.123
Ioffe S., Szegedy C. Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift//JMLR Workshop and Conference Proceedings. Proceedings of the 32nd International Conference on Machine Learning (Lille, France, July 06-11, 2015), 2015. Vol. 37. P. 448-456.
Szegedy C., Liu W, Jia Y. et al. Going Deeper with Convolutions//IEEE Conferenceon Computer Vision and Pattern Recognition (Boston, MA, USA, June 7-12, 2015), 2015. P. 1-9 DOI: 10.1109/CVPR.2015.7298594
Szegedy C., Vanhoucke V., Ioffe S., et al. Rethinking the Inception Architecturefor Computer Vision//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Seattle, WA, USA, Jun 27-30, 2016), 2016. P. 2818-2826 DOI: 10.1109/cvpr.2016.308
Szegedy C., Ioffe S., Vanhoucke V., et al. Inception-v4, Inception-ResNet and the Impactof Residual Connections on Learning//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) (San Francisco, California, USA, February 4-9, 2017), 2017. P. 4278-4284.
Cho K., van Merrienboer B., Gulcehre C., et al. Learning Phrase Representations UsingRNN Encoder-Decoder for Statistical Machine Translation//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha, Qatar, October 25-29, 2014), 2014. P. 1724-1734 DOI: 10.3115/v1/d14-1179
Cho K., van Merrienboer B., Bahdanau D., et al. On the Properties of Neural MachineTranslation: Encoder-Decoder Approaches//Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (Doha, Qatar, October 25, 2014), 2014. P. 103-111 DOI: 10.3115/v1/w14-4012
Chung, J., Gulcehre, C., Cho, K., et al. Empirical Evaluation of Gated Recurrent NeuralNetworks on Sequence Modeling//NIPS 2014 Workshop on Deep Learning (Montreal, Canada, December 12, 2014), 2014. P. 1-9.
He K., Sun J. Convolutional Neural Networks at Constrained Time Cost//2015 IEEEConference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, June 07-12, 2015), 2015. P. 5353-5360 DOI: 10.1109/CVPR.2015.7299173
Jia Y., Shelhamer E., Donahue J., et al. Caffe: Convolutional Architecture for FastFeature Embedding//Proceedings of the 22nd ACM International Conference on Multimedia (Orlando, FL, USA, November 03-07, 2014), 2014. P. 675-678 DOI: 10.1145/2647868.2654889
Kruchinin D., Dolotov E., Kornyakov K. et al. Comparison of Deep Learning Libraries onthe Problem of Handwritten Digit Classification//Analysis of Images, Social Networks and Texts. Communications in Computer and Information Science. 2015. Vol. 542. P. 399-411 DOI: 10.1007/978-3-319-26123-2_38
Bahrampour S., Ramakrishnan N., Schott L., et al. Comparative Study of Deep LearningSoftware Frameworks. URL: https://arxiv.org/abs/1511.06435 (дата обращения: 02.07.2017).
Bergstra J., Breuleux O., Bastien F., et al. Theano: a CPU and GPU Math ExpressionCompiler//Proceedings of the Python for Scientific Computing Conference (SciPy) (Austin, TX, USA, June 28 -July 3, 2010), 2010. P. 3-10.
Abadi M., Agarwal A. Barham P. TensorFlow: Large-Scale Machine Learning onHeterogeneous Distributed Systems//Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) (Savannah, GA, USA, November, 2-4, 2016), 2016. P. 265-283.
Collobert R., Kavukcuoglu K., Farabet C. Torch7: a Matlab-like Environment for MachineLearning//BigLearn, NIPS Workshop (Granada, Spain, December 12-17, 2011), 2011.
Seide F., Agarwal A. CNTK: Microsoft’s Open-Source Deep-Learning Toolkit//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) (San Francisco, California, USA, August 13-17, 2016), 2016. P. 2135-2135 DOI: 10.1145/2939672.2945397
Viebke A., Pllana S. The Potential of the Intel(r) Xeon Phi for Supervised DeepLearning//IEEE 17th International Conference on High Performance Computing and Communications (HPCC) (New York, USA, August 24-26, 2015), 2015. P. 758-765 DOI: 10.1109/hpcc-css-icess.2015.45
Chollet. F., et al. Keras. 2015. URL: https://github.com/fchollet/keras (датаобращения: 02.07.2017).
PadlePadle: PArallel Distributed Deep LEarning. URL: http://www.paddlepaddle.org/(дата обращения: 02.07.2017).
Chen T., Li M., Li Y. MXNet: A Flexible and Efficient Machine Learning Libraryfor Heterogeneous Distributed Systems. URL: https://arxiv.org/abs/1512.01274 (дата обращения: 02.07.2017).
Intel Nervana Reference Deep Learning Framework Committed to Best Performance on allHardware. URL: https://www.intelnervana.com/neon/(дата обращения: 02.07.2017).
Shi Sh., Wang Q., Xu P. Benchmarking State-of-the-Art Deep Learning Software Tools.URL: https://arxiv.org/abs/1608.07249 (дата обращения: 02.07.2017).
Weiss K., Khoshgoftaar T.M., Wang D. A Survey of Transfer Learning//Journal of BigData. 2016. Vol. 3, No. 1. P. 1-9 DOI: 10.1186/s40537-016-0043-6
Ba J., Mnih V., Kavukcuoglu K. Multiple Object Recognition with Visual Attention//Proceedings of the International Conference on Learning Representations (ICLR) (San Diego, USA,May 7-9, 2015), 2015. P. 1-10.
Graves A., Mohamed A.R., Hinton G. Speech Recognition with Deep RecurrentNeural Networks//IEEE International Conference on Acoustics, Speech and Signal Processing (Vancouver, Canada, May 26-31, 2013), 2013. P. 6645-6649 DOI: 10.1109/ICASSP.2013.6638947

Еще

Статья научная