Search | VHL Regional Portal

1.

Policy Gradient-Based Core Placement Optimization for Multichip Many-Core Systems.

Myung, Wooshik; Lee, Donghyun; Song, Chenhang; Wang, Guanrui; Ma, Cheng.

IEEE Trans Neural Netw Learn Syst ; 34(8): 4529-4543, 2023 Aug.

Article in English | MEDLINE | ID: mdl-34644256

ABSTRACT

As many deep neural network models become deeper and more complex, processing devices with stronger computing performance and communication capability are required. Following this trend, the dependence on multichip many-core systems that have high parallelism and reasonable transmission costs is on the rise. In this work, in order to improve routing performance of the system, such as routing runtime and power consumption, we propose a reinforcement learning (RL)- based core placement optimization approach, considering application constraints, such as deadlock caused by multicast paths. We leverage the capability of deep RL from indirect supervision as a direct nonlinear optimizer, and the parameters of the policy network are updated by proximal policy optimization. We treat the routing topology as a network graph, so we utilize a graph convolutional network to embed the features into the policy network. One step size environment is designed, so all cores are placed simultaneously. To handle large dimensional action space, we use continuous values matching with the number of cores as the output of the policy network and discretize them again for obtaining the new placement. For multichip system mapping, we developed a community detection algorithm. We use several datasets of multilayer perceptron and convolutional neural networks to evaluate our agent. We compare the optimal results obtained by our agent with other baselines under different multicast conditions. Our approach achieves a significant reduction of routing runtime, communication cost, and average traffic load, along with deadlock-free performance for inner chip data transmission. The traffic of interchip routing is also significantly reduced after integrating the community detection algorithm to our agent.

2.

Neuromorphic computing chip with spatiotemporal elasticity for multi-intelligent-tasking robots.

Ma, Songchen; Pei, Jing; Zhang, Weihao; Wang, Guanrui; Feng, Dahu; Yu, Fangwen; Song, Chenhang; Qu, Huanyu; Ma, Cheng; Lu, Mingsheng; Liu, Faqiang; Zhou, Wenhao; Wu, Yujie; Lin, Yihan; Li, Hongyi; Wang, Taoyi; Song, Jiuru; Liu, Xue; Li, Guoqi; Zhao, Rong; Shi, Luping.

Sci Robot ; 7(67): eabk2948, 2022 06 15.

Article in English | MEDLINE | ID: mdl-35704609

ABSTRACT

Recent advances in artificial intelligence have enhanced the abilities of mobile robots in dealing with complex and dynamic scenarios. However, to enable computationally intensive algorithms to be executed locally in multitask robots with low latency and high efficiency, innovations in computing hardware are required. Here, we report TianjicX, a neuromorphic computing hardware that can support true concurrent execution of multiple cross-computing-paradigm neural network (NN) models with various coordination manners for robotics. With spatiotemporal elasticity, TianjicX can support adaptive allocation of computing resources and scheduling of execution time for each task. Key to this approach is a high-level model, "Rivulet," which bridges the gap between robotic-level requirements and hardware implementations. It abstracts the execution of NN tasks through distribution of static data and streaming of dynamic data to form the basic activity context, adopts time and space slices to achieve elastic resource allocation for each activity, and performs configurable hybrid synchronous-asynchronous grouping. Thereby, Rivulet is capable of supporting independent and interactive execution. Building on Rivulet with hardware design for realizing spatiotemporal elasticity, a 28-nanometer TianjicX neuromorphic chip with event-driven, high parallelism, low latency, and low power was developed. Using a single TianjicX chip and a specially developed compiler stack, we built a multi-intelligent-tasking mobile robot, Tianjicat, to perform a cat-and-mouse game. Multiple tasks, including sound recognition and tracking, object recognition, obstacle avoidance, and decision-making, can be concurrently executed. Compared with NVIDIA Jetson TX2, latency is substantially reduced by 79.09 times, and dynamic power is reduced by 50.66%.

Subject(s)

Artificial Intelligence , Robotics , Algorithms , Elasticity , Neural Networks, Computer

3.

Brain-inspired global-local learning incorporated with neuromorphic computing.

Wu, Yujie; Zhao, Rong; Zhu, Jun; Chen, Feng; Xu, Mingkun; Li, Guoqi; Song, Sen; Deng, Lei; Wang, Guanrui; Zheng, Hao; Ma, Songchen; Pei, Jing; Zhang, Youhui; Zhao, Mingguo; Shi, Luping.

Nat Commun ; 13(1): 65, 2022 01 10.

Article in English | MEDLINE | ID: mdl-35013198

ABSTRACT

There are two principle approaches for learning in artificial intelligence: error-driven global learning and neuroscience-oriented local learning. Integrating them into one network may provide complementary learning capabilities for versatile learning scenarios. At the same time, neuromorphic computing holds great promise, but still needs plenty of useful algorithms and algorithm-hardware co-designs to fully exploit its advantages. Here, we present a neuromorphic global-local synergic learning model by introducing a brain-inspired meta-learning paradigm and a differentiable spiking model incorporating neuronal dynamics and synaptic plasticity. It can meta-learn local plasticity and receive top-down supervision information for multiscale learning. We demonstrate the advantages of this model in multiple different tasks, including few-shot learning, continual learning, and fault-tolerance learning in neuromorphic vision sensors. It achieves significantly higher performance than single-learning methods. We further implement the model in the Tianjic neuromorphic platform by exploiting algorithm-hardware co-designs and prove that the model can fully utilize neuromorphic many-core architecture to develop hybrid computation paradigm.

4.

End-to-End Implementation of Various Hybrid Neural Networks on a Cross-Paradigm Neuromorphic Chip.

Wang, Guanrui; Ma, Songchen; Wu, Yujie; Pei, Jing; Zhao, Rong; Shi, Luping.

Front Neurosci ; 15: 615279, 2021.

Article in English | MEDLINE | ID: mdl-33603643

ABSTRACT

Integration of computer-science oriented artificial neural networks (ANNs) and neuroscience oriented spiking neural networks (SNNs) has emerged as a highly promising direction to achieve further breakthroughs in artificial intelligence through complementary advantages. This integration needs to support individual modeling of ANNs and SNNs as well as their hybrid modeling, which not only simultaneously calculates single-paradigm networks but also converts their different information representations. It remains challenging to realize effective calculation and signal conversion on the existing dedicated hardware platforms. To solve this problem, we propose an end-to-end mapping framework for implementing various hybrid neural networks on many-core neuromorphic architectures based on the cross-paradigm Tianjic chip. We construct hardware configuration schemes for four typical signal conversions and establish a global timing adjustment mechanism among different heterogeneous modules. Experimental results show that our framework can implement these hybrid models with low execution latency and low power consumption with nearly no accuracy degradation. This work provides a new approach of developing hybrid neural network models for brain-inspired computing chips and further tapping the potential of these models.

5.

A hybrid and scalable brain-inspired robotic platform.

Zou, Zhe; Zhao, Rong; Wu, Yujie; Yang, Zheyu; Tian, Lei; Wu, Shuang; Wang, Guanrui; Yu, Yongchao; Zhao, Qi; Chen, Mingwang; Pei, Jing; Chen, Feng; Zhang, Youhui; Song, Sen; Zhao, Mingguo; Shi, Luping.

Sci Rep ; 10(1): 18160, 2020 10 23.

Article in English | MEDLINE | ID: mdl-33097742

ABSTRACT

Recent years have witnessed tremendous progress of intelligent robots brought about by mimicking human intelligence. However, current robots are still far from being able to handle multiple tasks in a dynamic environment as efficiently as humans. To cope with complexity and variability, further progress toward scalability and adaptability are essential for intelligent robots. Here, we report a brain-inspired robotic platform implemented by an unmanned bicycle that exhibits scalability of network scale, quantity and diversity to handle the changing needs of different scenarios. The platform adopts rich coding schemes and a trainable and scalable neural state machine, enabling flexible cooperation of hybrid networks. In addition, an embedded system is developed using a cross-paradigm neuromorphic chip to facilitate the implementation of diverse neural networks in spike or non-spike form. The platform achieved various real-time tasks concurrently in different real-world scenarios, providing a new pathway to enhance robots' intelligence.

6.

A system hierarchy for brain-inspired computing.

Zhang, Youhui; Qu, Peng; Ji, Yu; Zhang, Weihao; Gao, Guangrong; Wang, Guanrui; Song, Sen; Li, Guoqi; Chen, Wenguang; Zheng, Weimin; Chen, Feng; Pei, Jing; Zhao, Rong; Zhao, Mingguo; Shi, Luping.

Nature ; 586(7829): 378-384, 2020 10.

Article in English | MEDLINE | ID: mdl-33057220

ABSTRACT

Neuromorphic computing draws inspiration from the brain to provide computing technology and architecture with the potential to drive the next wave of computer engineering1-13. Such brain-inspired computing also provides a promising platform for the development of artificial general intelligence14,15. However, unlike conventional computing systems, which have a well established computer hierarchy built around the concept of Turing completeness and the von Neumann architecture16-18, there is currently no generalized system hierarchy or understanding of completeness for brain-inspired computing. This affects the compatibility between software and hardware, impairing the programming flexibility and development productivity of brain-inspired computing. Here we propose 'neuromorphic completeness', which relaxes the requirement for hardware completeness, and a corresponding system hierarchy, which consists of a Turing-complete software-abstraction model and a versatile abstract neuromorphic architecture. Using this hierarchy, various programs can be described as uniform representations and transformed into the equivalent executable on any neuromorphic complete hardware-that is, it ensures programming-language portability, hardware completeness and compilation feasibility. We implement toolchain software to support the execution of different types of program on various typical hardware platforms, demonstrating the advantage of our system hierarchy, including a new system-design dimension introduced by the neuromorphic completeness. We expect that our study will enable efficient and compatible progress in all aspects of brain-inspired computing systems, facilitating the development of various applications, including artificial general intelligence.

7.

Towards artificial general intelligence with hybrid Tianjic chip architecture.

Pei, Jing; Deng, Lei; Song, Sen; Zhao, Mingguo; Zhang, Youhui; Wu, Shuang; Wang, Guanrui; Zou, Zhe; Wu, Zhenzhi; He, Wei; Chen, Feng; Deng, Ning; Wu, Si; Wang, Yu; Wu, Yujie; Yang, Zheyu; Ma, Cheng; Li, Guoqi; Han, Wentao; Li, Huanglong; Wu, Huaqiang; Zhao, Rong; Xie, Yuan; Shi, Luping.

Nature ; 572(7767): 106-111, 2019 08.

Article in English | MEDLINE | ID: mdl-31367028

ABSTRACT

There are two general approaches to developing artificial general intelligence (AGI)1: computer-science-oriented and neuroscience-oriented. Because of the fundamental differences in their formulations and coding schemes, these two approaches rely on distinct and incompatible platforms2-8, retarding the development of AGI. A general platform that could support the prevailing computer-science-based artificial neural networks as well as neuroscience-inspired models and algorithms is highly desirable. Here we present the Tianjic chip, which integrates the two approaches to provide a hybrid, synergistic platform. The Tianjic chip adopts a many-core architecture, reconfigurable building blocks and a streamlined dataflow with hybrid coding schemes, and can not only accommodate computer-science-based machine-learning algorithms, but also easily implement brain-inspired circuits and several coding schemes. Using just one chip, we demonstrate the simultaneous processing of versatile algorithms and models in an unmanned bicycle system, realizing real-time object detection, tracking, voice control, obstacle avoidance and balance control. Our study is expected to stimulate AGI development by paving the way to more generalized hardware platforms.

8.

Truly Concomitant and Independently Expressed Short- and Long-Term Plasticity in a Bi₂ O₂ Se-Based Three-Terminal Memristor.

Zhang, Ziyang; Li, Tianran; Wu, Yujie; Jia, Yinjun; Tan, Congwei; Xu, Xintong; Wang, Guanrui; Lv, Juan; Zhang, Wei; He, Yuhan; Pei, Jing; Ma, Cheng; Li, Guoqi; Xu, Haizheng; Shi, Luping; Peng, Hailin; Li, Huanglong.

Adv Mater ; 31(3): e1805769, 2019 Jan.

Article in English | MEDLINE | ID: mdl-30461090

ABSTRACT

Concomitance of diverse synaptic plasticity across different timescales produces complex cognitive processes. To achieve comparable cognitive complexity in memristive neuromorphic systems, devices that are capable of emulating short-term (STP) and long-term plasticity (LTP) concomitantly are essential. In existing memristors, however, STP and LTP can only be induced selectively because of the inability to be decoupled using different loci and mechanisms. In this work, the first demonstration of truly concomitant STP and LTP is reported in a three-terminal memristor that uses independent physical phenomena to represent each form of plasticity. The emerging layered material Bi2 O2 Se is used for memristors for the first time, opening up the prospects for ultrathin, high-speed, and low-power neuromorphic devices. The concerted action of STP and LTP allows full-range modulation of the transient synaptic efficacy, from depression to facilitation, by stimulus frequency or intensity, providing a versatile device platform for neuromorphic function implementation. A heuristic recurrent neural circuitry model is developed to simulate the intricate "sleep-wake cycle autoregulation" process, in which the concomitance of STP and LTP is posited as a key factor in enabling this neural homeostasis. This work sheds new light on the development of generic memristor platforms for highly dynamic neuromorphic computing.

Subject(s)

Biomimetic Materials , Bismuth , Electrical Equipment and Supplies , Selenium Compounds , Action Potentials , Animals , Equipment Design , Neural Networks, Computer , Neuronal Plasticity , Neurons/physiology , Time Factors

9.

Fast Object Tracking on a Many-Core Neural Network Chip.

Deng, Lei; Zou, Zhe; Ma, Xin; Liang, Ling; Wang, Guanrui; Hu, Xing; Liu, Liu; Pei, Jing; Li, Guoqi; Xie, Yuan.

Front Neurosci ; 12: 841, 2018.

Article in English | MEDLINE | ID: mdl-30505264

ABSTRACT

Fast object tracking on embedded devices is of great importance for applications such as autonomous driving, unmanned aerial vehicle, and intelligent monitoring. Whereas, most of previous general solutions failed to reach this goal due to the facts that (i) high computational complexity and heterogeneous operation steps in the tracking models and (ii) parallelism-limited and bloated hardware platforms (e.g., CPU/GPU). Although previously proposed devices leverage neural dynamics and near-data processing for efficient tracking, their flexibility is limited due to the tight integration with vision sensor and the effectiveness on various video datasets is yet to be fully demonstrated. On the other side, recently the many-core architecture with massive parallelism and optimized memory locality is being widely applied to improve the performance for flexibly executing neural networks. This motivates us to adapt and map an object tracking model based on attractor neural networks with continuous and smooth attractor dynamics onto neural network chips for fast tracking. In order to make the model hardware friendly, we add local-connection restriction. We analyze the tracking accuracy and observe that the model achieves comparable results on typical video datasets. Then, we design a many-core neural network architecture with several computation and transformation operations to support the model. Moreover, by discretizing the continuous dynamics to the corresponding discrete counterpart, designing a slicing scheme for efficient topology mapping, and introducing a constant-restricted scaling chain rule for data quantization, we build a complete mapping framework to implement the tracking model on the many-core architecture. We fabricate a many-core neural network chip to evaluate the real execution performance. Results show that a single chip is able to accommodate the whole tracking model, and a fast tracking speed of nearly 800 FPS (frames per second) can be achieved. This work enables high-speed object tracking on embedded devices which normally have limited resources and energy.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL