Processing-in-memory: текущие направления развития технологии

Автор: Снытникова Т.В.

Журнал: Проблемы информатики @problem-info

Рубрика: Прикладные информационные технологии

Статья в выпуске: 3 (60), 2023 года.

Бесплатный доступ

Перемещение данных между центральным процессором и оперативной памятью является препятствием первого порядка на пути повышения производительности, масштабируемости и энергоэффективности современных систем. Компьютерные системы используют ряд методов для снижения накладных расходов, связанных с перемещением данных, начиная с традиционных механизмов и заканчивая новыми методами, такими как вычисления в памяти (Processingin-Memory, PIM). Эти методы можно разделить на два больших класса: вычисления рядом с памятью (processing-near-memory, PNM), когда вычисления выполняются в выделенных элементах обработки, и вычисление с использованием памяти (processing-using-memory, PUM), когда вычисления выполняются внутри массива памяти за счет использования внутренних аналоговых рабочих свойств запоминающего устройства. В работе рассматривается парадигма архитектур PIM и приводится обзор архитектур PUM, основанных на параллельных операциях DRAM и ассоциативных процессорах.

Еще

Архитектуры вычислений в памяти, ассоциативные процессоры, pdram

Короткий адрес: https://sciup.org/143181005

IDR: 143181005   |   УДК: 004.272   |   DOI: 10.24412/2073-0667-2023-3-37-54

Processing-in-memory: current trends in the development of technology

Moving data between the CPU and RAM is a first-order obstacle to improved performance, scalability, and energy efficiency in today’s systems. Computer systems use a number of methods to reduce the overhead associated with data movement, ranging from traditional mechanisms to new methods such as Proccssing-in-Mcmory (PIM) computation. These methods can be divided into two large areas: processing-near-memory (PNM), where computations are performed in dedicated processing elements, and processing-using-memory (PUM), where computations are performed inside a memory array by exploiting the internal analog operating properties of the memory device. This paper discusses the paradigm of PIM architectures and provides an overview of PUM architectures based on parallel DRAM operations and associative processors.

Еще

Список литературы Processing-in-memory: текущие направления развития технологии

  • Boroumand A. et al. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks //In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). Association for Computing Machinery, New York, NY, USA, 2018, P. 316-331.
  • Mutlu O. et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation // Microprocessors and Microsystems, 2019. V. 67, P. 28-41.
  • Ghose S. et al., The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption Beyond-CMOS Technologies for Next Generation Computer Design, 2019.
  • Mutlu O. et al. Enabling Practical Processing in and Near Memory For Datalntensive Computing // Enabling Practical Processing in and near Memory for Data-Intensive Computing, 2019, P. 1-4.
  • Ghose S. et al. Processing-in-Memory: A Workload-Driven Perspective //in IBM Journal of Research and Development, 2019, V. 63. N 6. P. 3:1-3:19.
  • Siegl P. et al. Data-Centric Computing Frontiers: A Survey on Processing-in-Memory // Data- Centric Computing Frontiers: A Survey On Processing-In-Memory, 2016, P. 295-308.
  • Mutlu O. et al. Processing Data Where It Makes Sense: Enabling In-Memory Computation // Microprocessors and Microsystems, 2019, V. 67, P. 28-41.
  • Wulf W. A., McKee S. A. Hitting the Memory Wall: Implications of the Obvious // SIGARCH Comput. Archit. News, 1995, V. 23, P. 20-24.
  • Alshahrani R. The Path to Exascale Computing // https://bit.ly/3CIzcll, 2015.
  • Oliveira G. F. et al. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks // IEEE Access, 2021, V. 9, P. 134457-134502.
  • Mutlu O. et al. A modern primer on processing in memory // Emerging Computing: From Devices to Systems: Looking Beyond Moore and Von Neumann. Singapore: Springer Nature Singapore, 2022. P. 171-243.
  • Santoro G., Turvani G., Graziano M. New logic-in-memory paradigms: An architectural and technological perspective // Micromachines, 2019, V. 10. N 6. P. 368.
  • Singh G. et al. Near-memory computing: Past, present, and future // Microprocessors and Microsystems, 2019, V. 71, P. 102868.
  • Seshadri V., Mutlu O. The Processing Using Memory Paradigm: In-DRAM Bulk Copy, Initialization, Bitwise AND and OR // arXiv preprint arXiv:1610.09603, 2016.
  • Seshadri V., Mutlu O. Simple operations in memory to reduce data movement // Advances in Computers, Elsevier, 2017, V. 106, P. 107-166.
  • Kim J. S. et al. GRIM-filter: Fast seed filtering in read mapping using emerging memory technologies //arXiv preprint arXiv:1708.04329, 2017.
  • Kim J. S. et al. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies // BMC genomics, 2018, V. 19, N 2. P. 23-40.
  • Boroumand A. et al. LazyPIM: An efficient cache coherence mechanism for processing-in-memory // IEEE Computer Architecture Letters, 2016, V. 16. N 1. P. 46-50.
  • Hashemi M., Mutlu O., Patt Y. N. Continuous runahead: Transparent hardware acceleration for memory intensive workloads // 2016 49th Annual IEEE/АСМ International Symposium on Microarchitecture (MICRO), IEEE, 2016, P. 1-12.
  • Hashemi M. et al. Accelerating dependent cache misses with an enhanced memory controller // ACM SIGARCH Computer Architecture News, 2016, V. 44. N 3. P. 444-455.
  • Seshadri V. et al. Gather-scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses // Proceedings of the 48th International Symposium on Microarchitecture, 2015, P. 267-280.
  • Pattnaik A. et al. Scheduling techniques for GPU architectures with processing-in-memory capabilities // Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016, P. 31-44.
  • Zhang D. et al. TOP-PIM: Throughput-oriented programmable processing in memory // Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, 2014, P. 85-98.
  • Ahn J. et al. A scalable processing-in-memory accelerator for parallel graph processing // Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015, P. 105¬117.
  • Gao M., Kozyrakis C. HRL: Efficient and flexible reconfigurable logic for near-data processing // 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), leee, 2016, P. 126-137.
  • Ahn J. et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture // ACM SIGARCH Computer Architecture News, 2015, V. 43, N 3S, P. 336-348.
  • Lee J. H., Sim J., Kim H. BSSync: Processing near memory for machine learning workloads with bounded staleness consistency models // 2015 International Conference on Parallel Architecture and Compilation (PACT), IEEE, 2015, P. 241-252.
  • Nai L. et al. Graphpim: Enabling instruction-level pim offloading in graph computing frameworks // 2017 IEEE International symposium on high performance computer architecture (HPCA), IEEE, 2017, P. 457-468.
  • Seshadri V. et al. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology // Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017, P. 273-287.
  • Seshadri V. et al. Fast bulk bitwise AND and OR in DRAM // IEEE Computer Architecture Letters, 2015, V. 14, N 2, P. 127-131.
  • Seshadri V., Mutlu O. Simple operations in memory to reduce data movement // Advances in Computers, Elsevier, 2017, V. 106. P. 107-166.
  • Kang Н. В., Hong. S. К. One-Transistor Type DRAM // US Patent 7701751, 2009
  • Lu S. L., Lin У. C., Yang C. L. Improving DRAM latency with dynamic asymmetric subarray // Proceedings of the 48th International Symposium on Microarchitecture, 2015, P. 255-266.
  • Gao F., Tziantzioulis G., Wentzlaff D. Computedram: In-memory compute using off-the-shelf drams // Proceedings of the 52nd annual IEEE/ACM international symposium on microarchitecture, 2019, P. 100-113.
  • Seshadri V. et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization // Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013. P. 185-197.
  • Kim J. S. et al. D-RaNGe: Using commodity DRAM devices to generate true random numbers with low latency and high throughput // 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE, 2019, V. 582-595.
  • Hajinazar N. et al. SIMDRAM: a framework for bit-serial SIMD processing using DRAM // Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, P. 329-345.
  • Olgun A. et al. QUAC-TRNG: High-throughput true random number generation using quadruple row activation in commodity DRAM chips // 2021 АСМ/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), IEEE, 2021, P. 944-957.
  • Ferreira J. D. et al. pluto: In-dram lookup tables to enable massively parallel general-purpose computation //arXiv preprint arXiv:2104.07699, 2021.
  • Olgun A. et al. PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in- DRAM // ACM Transactions on Architecture and Code Optimization, 2022, V. 20, N 1, P. 1-31.
  • Yaglikci A. G. et al. HiRA: Hidden Row Activation for Reducing Refresh Latency of Off- the-Shelf DRAM Chips // 2022 55th IEEE/АСМ International Symposium on Microarchitecture (MICRO), IEEE, 2022, P. 815-834.
  • Garzon E. et al. AM 4: MRAM crossbar based CAM/TCAM/ACAM/AP for in-memory computing // IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2023, V. 13, N 1, P. 408-421.
  • Снытникова T. В. Развитие ассоциативных параллельных архитектур // Проблемы ин-форматики. 2019. No 2. С. 36-50.
  • Мартышкин, А. И. Специализированный аппаратный модуль ассоциативного сопроцессора на базе ПЛИС для вычислительных систем с изменяемой структурой / А. И. Мартышкин, А. Н. Перекусихина // XXI век: итоги прошлого и проблемы настоящего плюс, 2019, Т. 8, № 3(47), С. 42-50.
  • Бондаренко М. Ф., Хаханов В. И., Литвинова Е. И. Структура логического ассоциативного мультипроцессора // Автоматика и телемеханика, 2012, № 10, С. 71-92.
  • Гайдук С. и др. Сферический мультипроцессор PRUS для решения булевых уравнений // Радиоэлектроника и информатика, 2004, № 4 (29). С. 69-78.
  • Yantir Н. Е. et al. An ultra-area-efficient 1024-point in-memory EFT processor // Micromachines, 2019, V. 10, No 8, P. 509-514.
  • Kaplan R., Yavits L., Ginosasr R. BioSEAL: In-memory biological sequence alignment accelerator for large-scale genomic data // Proceedings of the 13th ACM International Systems and Storage Conference, 2020, P. 36-48.
  • Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment //Proteins: Structure, Function, and Bioinformatics, 1991, V. 9, N 1, P. 56-68.
  • Hanhan R. et al. Edam: edit distance tolerant approximate matching content addressable memory // Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022, P. 495-507. 
  • Zhong Н. et al. ASMCap: An Approximate String Matching Accelerator for Genome Sequence Analysis Based on Capacitive Content Addressable Memory // arXiv preprint arXiv:2302.07478. — 2023.
  • Снытникова T. В., Непомнящая А. Ш. Решение задач на графах с помощью STAR- машины, реализуемой на графических ускорителях // Прикладная дискретная математика. 2016. Т. 3(33). С. 98-115.
  • Снытникова Т. В. Реализация модели ассоциативных вычислений на СРи:библиотека базовых процедур языка STAR // Вычислительные методы и программирование. Новые вычислительные технологии. 2018. Т. 19. С. 85-95.
Еще