Mali Graphics Processor Unit (GPU) use 16x16 pixel tile-based rendering to minimize external memory accesses keeping the entire working set for a tile in fast on-chip memory tightly coupled to the shader core. Processing takes two passes, the first executes all the geometry processing and generates a tile list data structure of primitives that contribute to each tile, the second executes all the fragment processing, tile by tile, each write to memory is a final state tile. The Mali architecture has benefits for some operations such as anti-aliasing while other operations such as tessellation are more efficiently handled in GPUs that support full frame processing.

Mali GPUs use compression to reduce bandwidth. Adaptive Scalable Texture Compression (ASTC) offers a variety of bit rate and colour component options for content developers to manage  quality/bandwidth trade off.  Arm Frame Buffer Compression (AFBC) is used when transferring data between IP blocks within a SoC design. It uses a block based approach for spatially coordinated compression and sends fixed headers identifying tile positions in the buffer first, followed by the compressed tiles.

Explore This Technology


Projects Using This Technology

Competition 2024
Competition: Hardware Implementation
Own created Image

Interference Detection and Mitigation Accelerator for Automotive Radar SoCs
Competition 2024
Competition: Collaboration/Education

A digital audio dynamic range compression accelerator for mixed-signal SoC
Competition 2023
Competition: Hardware Implementation

A 28nm Motion-Control SoC with ARM Cortex-M3 MCU for Autonomous Mobile Robots
Known Good Die
P. N. Whatmough et al. 2019 Symposium on VLSI Circuits

16nm SoC with A53 and eFPGA for flexible acceleration

Experts and Interested People


Research Area
Digital Circuits and Architectures
Graduate Student Researcher
Research Area
Hardware Acceleration


Log-in to Add to Your Profile


I think Characterising the types of accelerators might be a good idea, such as Graphics accelerators and ML accelerators to make it easier to distinguish which accelerators are used where. 

Along with this, I think we should have a standardised table to specify bus compatibility and throughput specifications/limitations for each accelerator to make it easier for researchers to make an educated decision into which accelerator to use in their project.

Add new comment

To post a comment on this article, please log in to your account. New users can create an account.