HomeCI/CD
CI/CD

The Memory Chip That Thinks: How Processing-in-Memory Is Attacking the Bottleneck Slowing Down AI

S
Staff Writer | Contributing Writer | Jun 28, 2026 | 7 min read ✓ Reviewed

Imagine a library where every time you want to do math, you have to carry the books out of the building, do your calculations in a separate room, and then carry everything back. Now imagine if the library itself could do the math — right there on the shelves, without all that carrying. That, in essence, is what Processing-in-Memory (PIM) is trying to accomplish inside your computer's hardware.

It sounds like a small technical detail, but it sits at the heart of one of the biggest challenges facing artificial intelligence today: the hardware can barely keep up with the data it needs to feed itself.

What Is the Memory Wall — and Why Should You Care?

Modern computers have two very different kinds of components: processors (CPUs and GPUs), which perform calculations at extraordinary speed, and memory (primarily DRAM — Dynamic Random-Access Memory), which stores the data those processors need. In principle, faster processors should mean faster computers. In practice, there's a catch.

Processors have been getting faster far more quickly than the connections between processors and memory. This means that even the most powerful chip can end up sitting idle, waiting for data to arrive. The 'memory wall' refers to the growing performance gap between CPU/GPU processing speed and the rate at which data can be transferred from DRAM, a phenomenon first formally described by Wulf and McKee in a 1995 ACM SIGARCH paper.

For most of computing history, this was a manageable annoyance. Software engineers used tricks — caches, prefetching, data layout optimizations — to hide the delay. But AI changed the equation dramatically.

Why AI Makes the Memory Wall So Much Worse

Training or running a large AI model — say, a language model or an image recognition system — involves doing the same basic mathematical operation (multiply a number, add it to another number) billions or trillions of times. These operations aren't particularly hard for a modern chip. What's hard is the data.

AI models contain millions or billions of numerical parameters (called weights) that must be read from memory, used in a calculation, and often written back. The processor is fast. The memory bus — the highway connecting memory to the processor — is the traffic jam. No matter how many lanes you add to the highway, when you're moving that much data that often, you hit a fundamental physical limit on bandwidth (how much data can flow per second) and energy (moving data across that highway costs power, often more than the calculation itself).

This is not a software problem you can optimize away. It's a physics problem baked into the architecture of how computers are built. PIM is an attempt to redesign that architecture at its roots.

How Processing-in-Memory Actually Works

The core idea of PIM is disarmingly simple: instead of sending data to the processor, bring some of the processor to the data. You add small compute units — simple arithmetic circuits, logic gates, or more sophisticated processors — directly onto the memory chip itself, or as close to it as physically possible.

When a calculation can be done entirely within the memory chip, the result is returned without ever traversing the slow external bus. The data never has to "leave the library." This has two immediate benefits: speed (no waiting on the bus) and energy efficiency (moving data short distances inside a chip costs a fraction of the power of sending it across a board).

Different Flavors of PIM

PIM isn't one single technology — it's a design philosophy implemented in several ways:

  • Processing Near Memory: Compute units are placed on the same chip package as the memory arrays, connected by very short, wide internal wires. They don't sit inside the memory cells themselves, but they're close enough to dramatically reduce data travel distance.
  • Processing In Memory: In the most radical version, computation happens directly within or between the memory cells themselves, exploiting the physical properties of the cells to perform logic operations. This is harder to manufacture but offers the most dramatic energy savings.
  • Smart Memory Controllers: A middle-ground approach where the memory controller — the chip that manages memory operations — is given enough intelligence to handle certain tasks autonomously without involving the main CPU.

What Real Products Look Like Today

PIM has moved well beyond academic research papers. The major memory manufacturers are shipping or actively developing products built on these principles.

Samsung's HBM-PIM

HBM stands for High Bandwidth Memory — a type of memory already used in high-end GPUs and AI accelerators because it stacks memory chips vertically to achieve much higher bandwidth than conventional DRAM. Samsung took this a step further by embedding programmable compute units directly into HBM layers. Samsung's HBM-PIM demonstrated approximately 2x performance improvement and 70% reduction in system energy consumption compared to conventional HBM2 in AI inference workloads, according to the company's published data.

Those are striking numbers. A 70% energy reduction matters enormously at data center scale, where power consumption directly translates to operating cost and environmental impact. And doubling performance without changing the rest of the system — just by rethinking how memory works — illustrates how severe the bottleneck was in the first place.

SK Hynix's AiM

SK Hynix has developed its own AiM (Accelerator-in-Memory) DRAM product, designed specifically to accelerate AI and machine learning workloads by placing processing units adjacent to memory arrays. The name is deliberate — AiM stands for Accelerator-in-Memory, and the product treats the memory chip itself as a first-class AI accelerator rather than a passive data store. By co-designing the memory and the compute logic from the ground up for AI workloads, SK Hynix is betting that the memory chip will become as important to AI systems as the GPU.

Micron's CXL Approach

Micron has taken a somewhat different path. Rather than embedding processors inside memory cells, they're focused on giving memory a smarter, more active role in the larger system. Micron has pursued a related approach called Compute Express Link (CXL) memory expansion, enabling memory to participate more actively in compute pipelines across a standardized interconnect.

CXL (Compute Express Link) is an industry-standard protocol — think of it as a shared language that lets processors, accelerators, and memory modules communicate more efficiently and flexibly. By building CXL-capable memory, Micron is positioning memory not as a dumb recipient of commands from a processor, but as an active participant in a compute system that might include many different chips from different manufacturers.

Why This Matters Beyond Speed

Energy Efficiency at Scale

Data centers running AI workloads consume enormous amounts of electricity — a significant and growing share of global power usage. Much of that energy doesn't go into useful computation; it goes into moving data back and forth. PIM attacks this directly by doing computation where the data already lives, drastically cutting the number of expensive data transfers.

A New Way to Think About Hardware

For decades, computer architects assumed a clean separation: memory stores, processors compute. AI is blurring that line. PIM represents a broader shift toward heterogeneous computing — systems where different specialized components (CPUs, GPUs, neural processing units, smart memory) each handle the tasks they're best suited for, communicating through standardized interfaces like CXL.

This matters for how future AI systems will be designed. Instead of simply throwing more GPU power at a model and hoping the memory can keep up, architects can co-design the memory and compute layers together, tuning the system to the specific data access patterns of AI workloads.

The Limits PIM Doesn't Solve

It's worth being honest about what PIM doesn't fix. The compute units embedded in memory chips are simpler and less flexible than a full GPU or CPU — they're good at the specific operations AI inference needs (matrix-vector multiplications, additions), but not general-purpose. Manufacturing these chips is harder and more expensive than conventional DRAM. And the software ecosystem — the programming tools, frameworks, and compilers that AI researchers rely on — needs to be updated to take advantage of PIM's capabilities, which is a slow process.

PIM is not a silver bullet. It's a targeted solution to a specific, well-defined problem: the memory wall in data-intensive workloads. Within that scope, it's one of the most promising architectural ideas in hardware design right now.

The Bottom Line

The memory wall has been a known problem since Wulf and McKee formally described it in a 1995 ACM SIGARCH paper, but AI has turned a slow-burning architectural challenge into an urgent crisis. Processing-in-Memory is the industry's most direct answer: redesign the memory chip itself so it can think, not just store.

Whether you're a researcher training large models, an engineer designing data center infrastructure, or simply someone curious about why AI hardware is so hard to build, PIM represents a fundamental rethinking of a boundary that computers have lived with for fifty years. The memory chip is no longer just a warehouse. It's becoming part of the factory floor.

Sources

Every factual claim in this article was independently verified against the following sources:

CI/CD Processing-in-Memory PIM architecture AI hardware
S
Staff Writer

Contributing Writer at UMI Groups

Related Articles