
2015 年 ,Jason Cong 在当年的国 际FPGA大会上,发表1篇FPGA加速DNN算法的论文“Optimizing FPGA- based Accelerator Design for Deep Convolutional Neural Networks”,使得 FPGAs 迅 速大火。很快地 ,2016 年 ,Google 发表 TensorFlow 框 架设计的 TPU 芯片 ,而同年 ,采用 TPU 架构的AlphaGo 出现 ,并击败人类世界冠军棋士李世石。还 是在同年 ,寒武纪研发出 DIANNAO,FPGA 芯片在 云计算平台得到广泛应用。仅仅在 2017 年 ,谷歌TPU 2.0发布,加强了训练效能[14];英伟达发布Volta架构,推进GPU的效能大幅提升;华为麒麟970成为 首个手机 AI 芯片;清华大学魏少军教授团队开发出Thinker 原型 ,并随后推出在算力和能效方面具有国际水平的系列 Thinker 人工智能芯片 。
近年来 ,世界上著名的学术研究机构和国际半导体公司都在积极研究和开发基于脉冲的神经拟态 电路[38-45]。如表 3 所示 ,基于 SNN 的神经拟态计算硬 件比基于传统 DNN 的硬件加速器具有更高的能量 效率。大多数最先进的神经拟态计算芯片[39-41 ,44]都是 基于成熟的 CMOS 硅技术对 SNN 进行 ASIC 设计 , 通过 SRAM 等存储器模拟实现人工突触 ,并利用关 键的数字或模拟电路仿生实现人工神经元。其中 最具有代表性的是 IBM 公司研发的基于 CMOS 多 核架构 TrueNorth 芯片[40] ,当模拟 100 万个神经元和2亿5000万个突触时,该芯片仅消耗70mW的功 耗 ,每个突触事件仅消耗 26 pJ 的极高能量效率 。然而 ,为了模仿生物突触和神经元的类脑特性 ,电 子突触和神经元需要高度复杂的 CMOS 电路来实现所需的人工突触和神经元的功能 ,如图 2 所示 。
以 IBM 的 TrueNorth 芯片为例 ,它包含 54 亿个晶体 管 ,在 28nm 工艺下占据 4.3 cm2 的面积。因此 ,这一 类基于脉冲的神经拟态 CMOS 硬件电路使用大量 的晶体管 ,并导致耗费非常大的芯片面积。加之 , 现有的大多数神经拟态芯片[39-41 ,44] 由于其计算单元 与存储单元在局部依然是分离的 ,这在用于神经元 的 CMOS 逻辑电路和用于突触的 SRAM 电路之间 依然存在局部的存储壁垒问题和能量效率问题 ,所 以实际上还不是真正意义上的非冯 · 诺依曼体系结 构。不过最新的具有三维堆叠能力的非易失性存 储器(NVM)技术或存内计算技术(in-memory computing )有望解决这一问题 。
参考文献
[ 1 ] YANN L C, CORTES C. The MNIST database of hand-written digits[EB/OL].[2019-02-26]. http://yann.lecun.com /exdb/mnist/.
[ 2 ] TAIGMAN Y, YANG M, RANZATO M A, et al. Deep- face: Closing the gap to humanlevel performance in face verification[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2014: 1701-1708.
[ 3 ] Amazon Alexa. Ways to build with Alexa[EB/OL]. [EB/ OL]. [2019-02-24]. https://developer.amazon.com/alexa.
[ 4 ] Apple Siri. Siri does more than ever, even before you ask [EB/OL]. [2019-02-24]. http://www.apple.com/ios/siri/.
[ 5 ] Microsoft Cortana Personal Assistant. Cortana. Your intelligent assistant across your life [EB/OL]. [2019-02-24]. https://www.microsoft.com/en-us/cortana.
[ 6 ] QUIGLEY M, CONLEY K, GERKEY B, et al. ROS: an open-source Robot Operating System[C]// ICRA workshop on open source software. 2009: 5.
[ 7 ] URMSON C, BAGNELL J A, BAKER C R, et al. Tartan racing: a multi- modal approach to the DARPA urban challenge[R]. Technical report, Carnegie Mellon University, 2007.
[ 8 ] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[ 9 ] EMILIO M, MOISES M, GUSTAVO R, et al. Pac-mAnt: optimization based on ant colonies applied to developing an agent for Ms. Pac- Man[C]// IEEE Symposium on Computational Intelligence and Games (CIG). IEEE, 2010: 458-464.
[10] CHEN T, DU Z, SUN N, et al. DianNao: a small footprint highthroughput accelerator for ubiquitous machine- learning[C]// 128 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2014: 269-284.
[11] CHEN Y, LUO T, LIU S, et al. DaDianNao: a machine-learning supercomputer[C]// 2014 47th Annual IEEE/ ACM International Symposium on Microarchitecture (MICRO). IEEE, 2014: 609-622.
[12] LIU D, CHEN T, LIU S, et al. PuDianNao: a polyvalent machine learning accelerator[C]// International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 2015:369-381.
[13] DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]// International Symposium on Computer Architecture (ISCA). 2015:92-104.
[14] JOUPPI NP, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit[C]// International Symposium on Computer Architecture (IS- CA). 2017:1-12.
[15] KAPOOHT. Von Neumann architecture scheme[J/OL]. The Innovation in Computing Companion, 257- 259. https://en.wikipedia.org/wiki/Von_Neumann_architecture.
[16] FARABET C, POULET C, HAN J Y, et al. CNP: an fpga-based processor for convolutional networks[C]// International Conference on Field Programmable Logic and Applications (FPL). 2009:32-37.
[17] FARABET C, MARTINI B, CORDA B, et al. Neu Flow: a runtime reconfigurable dataflow processor for vision [C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2011:109-116.
[18] GOKHALE V, JIN J, DUNDAR A, et al. A 240 G-ops/s mobile coprocessor for deep neural networks[C]// IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . IEEE, 2014: 682-687.
[19] NEUMANN J V. The principles of large-scale computing machines[J]. Annals of the History of Computing, 1981, 3(3):263-273.
[20] MEAD C. Neuromorphic electronic systems[J]. Proceedings of the IEEE, 1990, 78(10):1629-1636.
[21] STRUKOV D B. Nanotechnology: smart connections[J]. Nature, 2011, 476(7361): 403-405.
[22] JEFF H, BLAKESLEE S. On intelligence[M]. London:Macmillan, 2007.
[23] BENJAMIN, VARKEY B, GAO P, et al. Neurogrid: a mixed- analogdigital multichip system for large- scale neural simulations[J]. Proceedings of the IEEE, 2014, 102 (5): 699-716.
[24] MEROLLA P A, ARTHUR J V, ALVAREZ-ICAZA R, et al. A million spiking- neuron integrated circuit with a scalable communication network and interface[J]. Science, 2014, 345 (6197): 668-673.
[25] CASSIDY A S, ALVAREZ-ICAZA R, AKOPYAN F. Real- time scalable cortical computing at 46 giga- synaptic OPS/watt with ~100 × speedup in time- to- solution and ~ 100,000 × reduction in energy- to- solution[C]// SC '14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2014.
[26] FURBER S B, GALLUPPI F, TEMPLE S, et al. The spinnaker projec[J]. Proceedings of the IEEE, 2014,102 (5): 652-665.
[27] SCHEMMEL J, BRIIDERLE D, GRIIBL A, et al. A waferscale neuromorphic hardware system for large- scale neural modeling[C]// Proceedings of 2010 IEEE International Symposium on Circuits and Systems. IEEE, 2010.
[28] LECUN Y, BENGIO Y, HINTON G, et al. Deep learning [J]. Nature, 2015, 521(7553): 436-444.
[29] KELLER J, PEREZ O. Improving MCTS and neural network communication in computer go[R]. Worcester Polytechnic Institute, 2016.
[30] ZHANG S J ,DU Z D ,ZHANG L, et al. Cambricon- X an accelerator for sparse neural networks [C]. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016.
[31] KOWALIK J S. Parallel computation and computers for artificial intelligence[M]. Springer Science & Business Media, 2012.
[32] VERHELST, M, MOONS B. Embedded deep neural network processing algorithmic and processor techniques bring deep learning to IoT and edge devices[J]. IEEE Solid-State Circuits Magazine, 2017, 9(4): 55-65.
[33] JOUPPI N P, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit [J].44th International Symposium on Computer Architecture(ISCA), 2017.
[34] SZE V, CHEN Y H, YANG T J, et al. Efficient processing of deep neural networks a tutorial and survey [J/OL].Proceedings of the IEEE, 2017, 105(12).
[35] BENNIS M. Smartphones will get even smarter with on device machine learning[J/OL]. IEEE Spectrum, 2018. https://spectrum.ieee.org/tech-talk/telecom/wireless/smart- phones- will- get- even- smarter- with- ondevice- machine-learning.
[36] MAASS W. Networks of spiking neurons: the third generation of neural network models[J]. Neural Networks, 1997, 10(9): 1659-1671.
[37] MEAD C. Neuromorphic electronics system[J]. Proceedings of the IEEE, 1990, 78(10): 1629-1636.
[38] PAINKRAS E, PLANA L A, GARSIDE J, et al. SpiNNaker: a 1- W 18- core system- on- chip for massively- parallel neural network simulation[J]. IEEE Journal of SolidState Circuits Page(s), 2013, 48(8):1943 - 1953.
[39] BENJAMIN B V, GAO P, MCQUINN E, et al. Neu- rogrid a mixed- analog digital multichip system or large- scale neural simulations[J]. Proceedings of IEEE, 2014, 102(5): 699-716.
[40] MEROLLA P A, ARTHUR J V, ALVAREZ-ICAZA R, et al. A million spiking- neuron integrated circuit with a scalable communication network and interface[J]. Science, 2014, 345(6197): 668-673.
[41] DAVIES M, SRINIVASA N, LINT H, et al. Loihi a neuromorphic manycore processor with on-chip learning[J]. IEEE Micro, 2018, 38(1): 82 - 99.
[42]KIM S. NVM neuromorphic core with 64 k- cell(256- by- 256) phase change memory synaptic array with on-chip neuron circuits for continuous in-situ learning[C]// IEEE International Electron Devices Meeting (IEDM). IEEE, 2015.
[43] CHUM, KIM B, PARK S, et al. Neuromorphic hardware system for visual pattern recognition with memristor array and CMOS neuron[J]. IEEE Transactions on Industrial Electronics, 2015, 62(4): 2410 - 2419.
[44] SHI LP , PEI J, DENG N, et al. Development of a neuromorphic computing system[C]// IEEE International Electron Devices Meeting(IEDM). IEEE, 2015.
[45] JIANG Y N,HUANG P, ZHU, D B, et al. Design and hardware implementation of neuromorphic systems with RRAM synapses[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2018, 65(9): 2726 - 2738. [46] YU S M, CHEN P Y. Emerging memory technologies: recent trends and prospects[J]. Proceedings of the IEEE,2016, 8(2): 43 - 56.
[47] SURI M. CBRAM devices as binary synapses for low-power stochastic neuromorphic systems: auditory and visual cognitive processing applications[C]// Proceeding of IEEE International Electron Devices Meeting (IEDM), 2012: 3-10.
[48] WANG Z. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing[J]. Nature Materials, 2017, 16(1): 101-108.
[49] YANG J J, STRUKOV D B, STEWART D R. Memristive devices for computing[J]. Nature Nanotechnology,2013, 8(1): 13-24.
[50] JO S H. Nanoscale memristor device as synapse in neuro-morphic systems[J].Nano letters, 2010, 10(4): 1297-1301. [51] OHNO T. Short- term plasticity and long- term potentiation mimicked in single inorganic synapses[J]. Nature Materials, 2011, 10(8): 591-595.
[52] WANG, Z R,JOSHI S,SAVEL’EV S E, et al. Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing[J]. Nature Materials, 2017, 16 (1): 101-108.
北京英恒数字科技
专注于中国未来数字科技发展的创新型科技企业


