# Artificial Intelligence - The Power Problem
Why the current trend in power consumption is troubling and some potential solutions
# Introduction
We are currently being overwhelmed with data. While data storage and processing infrastructure lag behind data production, of the 7.5 septillion gigabytes being collected every day, over 55% is under-utilized or not utilized at all (opens new window). The current power and data requirements of deep learning algorithms makes them ecologically and economically unviable for large scale training and deployment. When training a single AI model can emit as much carbon as five cars in their lifetimes (opens new window), and the compute power required doubles every three and a half months (opens new window), it is clear our current progress is unsustainable. Drastically reducing the resource requirements of machine learning and artificial intelligence algorithms and the systems they run on is a pressing issue that must be solved in order to allow greater equity in the availability of this technology. In this article, I will explore three technologies which, if adopted, can have significant energy saving benefits for artificial intelligence and machine learning technology.
There are many reasons why we need to improve the energy efficiency of AI technologies. For off-grid applications, or applications in areas with areas with unstable power grids, increased energy efficiency is necessary to make artificial intelligence technology viable. Most current AI systems require too much energy to operate remotely as suitable battery power is infeasible due to the significant cost of designing and manufacturing large-capacity batteries. Lowering the energy cost of AI algorithms will in turn lower the cost of hardware to run these models, as batteries and components quickly grow in cost at larger capacities. Additionally, progress in improving battery technology has not kept pace with the computational needs of modern AI solutions. With lower hardware and energy costs, many AI applications will become economically viable that were not previously, especially applications that rely on large distributed networks of sensors. Lowering the economic barrier to entry for this technology will also stimulate further research by academic institutions and small firms with less access to the expensive and proprietary hardware necessary for modern research.
We also know that more efficient algorithms exist, we just have not yet discovered them. Much of the current progress in AI systems has taken place in the area of neural network models. These models are a simplistic representation of the human brain, and learn new information in a similar way. However, increases in the accuracy of these models have consistently required exponential increases in training data and training iterations, which means the current state-of-the-art in AI requires exponentially more power than the previous generation algorithm. In nearly every application, from object detection to voice recognition, we have not yet reached human brain levels of accuracy, despite these systems operating with 10000x times more power consumed than the human brain. The human brain requires approximately 5Ws of power for its computational processes, whereas AI systems require 100,000 Watts. As we come to better understand how the human brain operates, we will hopefully discover new methods of computation that will significantly improve the efficiency of AI algorithms.
# Analog Signal Processing
Let us look at the current state of data collection techniques used for environmental sensors, one of the foremost use cases of Artificial Intelligence outside of Natural Language Processing research. In current systems, environmental sensors, such as microphones, motion detectors, accelerometers, and gyroscopes, are “always-on”; meaning they are constantly collecting the maximum amount of data possible and therefore operating under maximum power draw. This constant data collection also means all of the data must be stored, transported, and analyzed, all of which consume exponential additional energy for every increase in the amount of data collected. However, there is an emerging alternative, utilizing analog signal processing. With analog signal processing, an AI model can be trained to identify “interesting” patterns in analog signals, utilizing a fraction of the energy required to convert data from analog to digital (opens new window). This works by moving a portion of the data analysis work to the point of data collection, rather than exclusively at a centralized server after data has already been collected. This selectivity of “interesting” data allows AI data pipelines to operate with greater energy efficiency, as they only need to store, transport, and analyze the data that has been pre-labeled as “interesting”. This has security benefits as well. Since there is less data being collected, and the data being collected is intended for a specific purpose, there is a lower risk that unnecessary sensitive data will be collected. To illustrate this effect, let us look at the example of voice recognition devices, such as Amazon’s Alexa. Using the standard, inefficient approach, the microphone in the voice recognition device will be “always-on”, collecting, digitizing, transporting, and analyzing all registered audio signals in its environment. However, a device like this should only be interested in listening for the phrase “Hi, Alexa” and the command sentence that follows it. For the inefficient approach, all data will be assumed to contain this information, the remote server will have to run analysis on all data in search of this one particular phrase. Alternatively, with analog signal processing, AI at the point of collection can be trained to identify the analog audio waveform associated with the phrase “Hi, Alexa”. This means, before the data has been converted from analog to digital (an extremely energy hungry process) the “interesting” data has already been identified. Then, only this interesting data needs to be converted to digital, and sent to the centralized server for further analysis. This eliminates the vast majority of sensor data from consideration, and reduces the storage and energy costs of a device like this. This also limits the amount of personal data Amazon has on its storage servers, since audio signals outside of the "interesting" data never needs to be collected.
# Federated Learning
Another technique that is slowly being introduced to AI applications is federated learning (opens new window), a system in which AI models are trained on multiple devices across a network, as opposed to on a single centralized server. In non-federated applications, node devices collect training data which is then sent to a centralized server. The centralized server trains a model based on the training data collected from multiple nodes. Alternatively, with federated learning, the AI models can be sent between the nodes and a centralized server, as opposed to raw training data. This works as such: First, each node collects their training data, and trains its own model. If this were the only step, this would be a severe disadvantage to traditional technique, as a singular node only has access to a fraction of the network's total training data. However, with federated learning each individual node sends its trained model to the centralized server, which aggregates all of the models into a single, more accurate, model. Then that model is redistributed to each child node, and the cycle repeats. This is advantageous because the trained AI model is significantly smaller in size than the raw training data, so sending it across a network has significant energy savings. This has security benefits as well, since the raw data, which may contain sensitive information, no longer needs to be sent to an outside entity for training. However it is still possible to reverse engineer the algorithm to try to recreate training data. With additional techniques, such as adding noise to the data, the ability to recreate training data can be minimized. This is an important area for further research.
# Phase-Change Memory (PCM) Devices
The final technology that I will discuss today is still under development, and that memristor technology, specifically, phase-change memory devices (opens new window). When this technology becomes widely available, it has the ability to significantly expand the viability of AI technology even more so than the previous two technologies. Memristor’s are a powerful tool for AI technology because of their unique electrical characteristics. AI computations rely almost exclusively on the mathematical concepts of linear algebra and matrix multiplication. Using digital computation techniques, matrix multiplication has a time complexity of O(n^3), and with significant optimizations, can begin to approach O(n^2). However, utilizing the unique physical properties of analog electrical circuits, matrix multiplication is possible in constant time, O(1).What this means is that instead of the computation requirements for each additional data point growing exponentially, it will always require a constant amount of power. Matrix computations are completed on a physical electrical circuit composed of a grid of resistors. The row and column orientation of each resistor corresponds with the row and columns of a matrix, and the resistance value of the resistors correspond with the values in the matrix. The voltages fed into this grid are analogous to the vector values being multiplied by the matrix. The output currents correspond to the output vector values of the matrix multiplication operation. While a device with this design is currently possible, and very easy implemented, the current limiting factor is our ability to update the resistor values as we would when performing backpropagation (opens new window) during AI model training. Backpropagation is the crucial step in AI model training, it is the portion where the error of the given solution is compared to the expected solution, and the matrix values are updated to minimize that error for the next test case. With standard resistors this is not possible without physically replacing each individual resistor with a new one, which is extremely impractical. However, this is where phase-change memory devices will have a huge impact. Phase-change memory allows for the instant updating of all resistor values by applying an electric current through the device. These devices are being built using cutting-edge nanomaterials. The resistivity of these nanomaterials can be set by passing an electrical current through them.
# Conclusion
The great thing about these three new technologies is that they are not mutually exclusive, in fact they can have the greatest positive gains when used together, as they each target a different portion of an AI system’s deployment. Let us return to the example of Amazon Alexa again. We have already looked at how Analog signal processing can improve energy efficiency, so now we will look at how we can extend those power savings using federated learning and phase-change memory devices. Firstly, phase-change memory will have an enormous impact at all stages where machine learning and artificial intelligence computation is being performed, be that in analog signal processing or on a centralized server. This will further reduce the power costs of the analog analysis neural network at the beginning of the data pipeline as well as the point of training a larger model on a centralized server. Connecting those two locations where analysis is performed is federated learning, where the preselected “interesting” data is used to train the AI model on a node device (again using phase change memory for improved efficiency). The trained model is then sent over the network to the centralized server, where each node model is aggregated into a single voice recognition model. That model is then redistributed to each node device, improving their accuracy without the need to access large amounts of private raw data collected from other individual devices.
I hope this was a helpful overview of some of the current technology being developed to potentially solve some of the power costs of implementing AI. I urge anyone interested in implementing their own distributed AI network to explore these technologies further and utilize their power-saving benefits.