NPUs: The Rise of Neural Processing Units

Neural Processing Units (NPUs) have been around for a while, but they’ve recently entered the mainstream. But what exactly are NPUs, why do we need them, and when should we use them?

To understand their role, let’s first take a look at the processors we’ve relied on until now.

The Evolution of Processors

For much of computing history, the Central Processing Unit (CPU) has been the workhorse of general-purpose computing. CPUs handle everything from running operating systems to executing applications. However, they process tasks sequentially—one operation at a time on one piece of data at a time. Even with multi-core CPUs, each core still follows a sequential approach.

Then came Graphics Processing Units (GPUs). Originally designed for rendering graphics, GPUs excel at handling the mathematical operations needed for 2D and 3D images. These operations often involve applying the same calculation to massive amounts of data—such as processing thousands or millions of pixels simultaneously. To achieve this, GPUs contain thousands of smaller cores that work in parallel.

It didn’t take long for AI and machine learning researchers to realise that GPUs were also well-suited for AI workloads. Training and running AI models require massive computational power, and GPUs’ parallel processing capabilities significantly accelerate these tasks. However, GPUs come with a major drawback: they consume a lot of power. Some modern GPUs use over 600 watts when running at full capacity—comparable to a small electric heater—making them expensive to operate and generating substantial heat.

Enter Neural Processing Units

NPUs first emerged around 2016, with some early versions branded as Tensor Processing Units (TPUs). Unlike CPUs, which are designed for general-purpose computing, and GPUs, which provide generic parallel processing, NPUs are purpose-built for AI and machine learning tasks. They specialise in executing neural network operations, such as multiply-add calculations, with exceptional speed and efficiency.

The key advantage of NPUs is their low power consumption. While GPUs deliver impressive AI performance, they are power-hungry. NPUs, on the other hand, provide high-speed AI processing with significantly lower energy usage. This makes them ideal for AI inference—running trained models in real-world applications, including on small hardware such as smartphones. However, when it comes to training AI models, GPUs still hold the edge.

As AI continues to advance, NPUs are poised to play a crucial role, offering a balance of speed and efficiency that bridges the gap between power-hungry GPUs and the slower, more general-purpose CPUs.

Real-World Applications

NPUs are now available in many hardware platforms, including some small form-factors computers such as the OrangePi 5 Pro.

I have recently completed a successful project using a RockChip NPU. (RK3568 and RK3588). The NPU toolkits are good, but initially tricky to work with. Contact me to see if I can help with your NPU-based AI project.

Genetic Algorithms, Particle Swarm Optimisation, Evolutionary Computing

Genetic algorithms (GAs) are a search and optimisation technique inspired by the ideas of “survival of the fittest” in populations of individuals, as well as other genetic concepts such as crossover and mutation.

GAs can often find good solutions to problems very quickly – often finding solutions in complex, multi-dimensional, non-linear problem “spaces” that other algorithms struggle badly with.

Successfully applying a Genetic Algorithm to a problem involves steps such as:

  • Identifying whether the problem “space” is suited to a GA.
  • Encoding the problem into a “genome” that the GA can work with.
  • Writing a GA (or using a standard library).
  • Defining and writing a fitness function.
  • Avoiding pitfalls such as using a weak random number generator, using encodings with big “step” values in them which can block improvements, etc.

Unlike neural networks where I favour a pre-written open source library, with Genetic Algorithms I prefer to write my own – the algorithm itself is small and simple, and it is best to have control over some of the other aspects mentioned above.

I have used my own GAs as part of commercial projects mentioned elsewhere on this website, including Computer Vision, and other data analysis projects.

I have also implemented other evolutionary computing algorithms, such as variations of Particle Swarm Optimisation and Ant Colony Optimisation. Each algorithm has its own “class” of problem that it solves better than most other algorithms.

Please email me to discuss your project and we’ll see if I can help.