Fused multiply-adders using in vector dataflow processor

Бесплатный доступ

A processor with a flow control architecture can perform up to 16 instructions per cycle compared to 4-6 instructions in time with the best Von Neumann architecture processors. Simulation of the vector stream processor showed that its performance on the matrix multiplication program can be brought up to 256 flops per clock with the output of less than 8 instructions per clock, and maintained close to peak performance with a much smaller size of the processed matrices. The advantages and disadvantages of using in this processor on vector processing a pipeline "doubled" multiplier and adder are used instead of separate multipliers and adders with floating point. Key words and phrases: supercomputer, vector processor, flow control architecture, performance evaluation, fine-grained parallelism, dual arithmetic

Еще

Короткий адрес: https://sciup.org/14336167

IDR: 14336167

Статья научная