What is Machine Learning? (From a hardware engineer's point of view)
There are too many articles or blogs on this topic, so I think that explaining from the scratch would add unnecessary traffic to the Internet. Usually, the most referenced site is Google's Tensorflow tutorial or Pytorch tutorial, and those who want to look into it, they will meet Andrew Ng's Stanford course (Coursera provides the cool lectures of Andrew Ng, so I strongly recommend you take the course.) Most of the tutorials start with saying that anyone can use AI very easily without knowing the mathematical details. I think some are right and some are wrong. You should know at least basic linear algebra such as matrix addition/multiplication, derivertives, etc., so you can go one step further. Even though I completed tutorials (like MNIST) without any hurdle, I felt get so astrayed that I never catched what to do next. Most hardware engineers might feel the same feeling as mine. So, I decided to dig out what's inside of Machine Learning to turn it into hardware implementation.
First off, the terminology is very confusing. Some people call it AI, some call it Machine Learning (ML in the future), and some call it Deep Neural Network (DNN). I like to write ML from a hardware engineer's point of view. It seems that it is because I want to limit the learning that takes place within the scope of a machine made by humans. I don't like AI too much because it feels like an artificial intelligence like Terminator is growing something by adding intelligence to itself. People who do SW seem to like the idea of AI more because they don't think about hardware well due to the nature of their jobs. It seems right to view DNN as a component technology that makes AI and ML possible.
So, what happens to ML? After properly weaving the models provided by Tensorflow or Pytorch, is it over by repeating Trial and Error until the desired result is obtained? Basically, you need to know the Vector and Matrix concepts of Linear Algebra, and also know the activation function, loss function, optimization criterion (MMSE, MSE, etc.), and what optimizer to use. Also, there are various types of DNNs, so there are CNN, RNN, Autoencoder, SVM, ... that cannot be counted. Do we need to know all this to say we can do ML? If you want to study, understand, and dive into all these things from the beginning, it will be difficult to keep up with even a few years of study.
To narrow it down a bit and implement this from a hardware engineer's point of view, what do we need to know? It would be a good idea to divide the goals into two broad categories. First, will both modeling and inference be performed with hardware? Second, will the hardware only perform inference on the model that has already been made? First, consider the concept of performing in Hardware. What kind of hardware is around us? The most common are general CPUs, GPUs with excellent parallel processing capabilities, DSP is strong in Multiply-and-Addition (with VLIW structure), recently announced by Google, Tensor Processing Units (TPUs) that can perform Tensor operations well. Although it is a similar concept, it can be divided into the NPU of the accelerator concept specialized for a specific function, and the FPGAs. The CPU/GPU combination is the most commonly used hardware, and it is the best hardware to create a model and perform inference. The rest is hardware that has recently emerged. Then, the question arises why Google made TPUs and Intel and AMD bought FPGA companies. The combination of CPU/GPU has already been widely used commercially, and there are many well-trained engineers and libraries that have been abundantly accumulated over the years, and the open source community is also very active.
From what I understand, there seem to be two main reasons. First, in order to run a very large ML algorithm, a lot of CPU/GPU must be utilized in the data centers. Therefore, a lot of electricity is required for the processing itself and a huge cooling cost is incurred to solve the thermal problem that is unavoidable due to the nature of CMOS semiconductors. In addition, a lot of memory and storage must be used. In order to reduce this burden, if there is a hardware accelerator that can effectively process with less power than CPU/GPU when the model has already been validated and no new training is required, it is expected to make a greater contribution to expanding ML in the future. Recently, I saw an article that many cloud service providers are promoting performance improvement and cost reduction by using Xilinx's FPGA NIC more actively in the data center. Data polishing probably plays a bigger role in NIC (Network Interface Card), but of course, ML is also expected to actively accept it. In the United States, investment companies are increasingly hiring FPGA engineers, and since a huge amount of data for investment flows from all over the world, uploading all raw data to the server will have to pay a huge price for the processing cost. Therefore, FPGAs are used a lot for primary filtering at the data receiving entrance. In the case of Amazon, it is already providing computing that can utilize FPGAs to AWS, and it is providing services by charging an hourly rate for an ML model that has already been verified. Going one step further, AWS Market Place has been created to create a marketplace where custom-made FPGA images can be sold or paid for. In addition, it provides a development platform that can be created and uploaded remotely to the Vitis environment of Xilinx to facilitate development. From Xilinx's point of view, there will be an effect of locking in AWS so that only Xilinx can be used as their environment, and I think that AWS can help each other in terms of accommodating various cloud users with a cheaper investment.
The second reason to perform ML on hardware other than CPU/GPU is the need to distribute computing to the edge, which will come in huge demand in the future, in addition to the first reason to solve the aforementioned data center issue. Edge computing can be defined in many ways, but let's imagine an edge that is very close to the consumer. Then the first thing that comes to mind is IoT devices such as Camera, sensors, applications, industry, automotive. For example, if there is an application that monitors various sensor data in real time and takes action when a problem occurs, the current method is to upload all raw data to the cloud and make a decision. In addition, it will be controlled using the Internet, and huge communication power will be consumed, and the amount of data that is concentrated in the cloud will increase significantly as the number of devices increases. If the communication power is greater than the processing power used to run the ML in the processor of the device, of course, you will choose to run the ML directly at the edge. So, research groups such as TinyML are conducting research on how ML can be run on a very small processor using a small amount of memory. Tensorflow lite also makes it possible to run ML on a RISC microcontroller-class CPU of ARM's Cortex-M series with the same approach. ST-Micro also provides a function that automatically generates software that can be run in its microcontroller by directly receiving a Tensorflow or Keras model in its own development environment. Strictly speaking, Google's TPU can also be seen as targeting Edge devices, but it still has a limit to being applied to small devices due to its high power consumption. Meanwhile, in the FPGA camp, Lattice and others are providing low-power/low-cost ML solutions using small FPGAs. For example, the introduction of FPGAs that exclusively process wake words such as "Alexa" such as Amazon echo, or Lenovo introduced a function to unlock the computer using FPGAs at CES earlier this year.
The explanation was little bit long, but in conclusion, the thing to consider more about hardware is to focus on how to effectively inference the created model rather than how to make the model well. In that case, once you understand the complex functions used in modeling, such as Optimizer, Loss function, Derivatives, etc., you just need to understand what they are. Then, you only need to know the operations used for the remaining inference, that is, the basic Vector/Matrix operation, the activation function, and the network structure used for each type of DNN. Doesn't it seem like you've already lifted a lot of weight with this alone? In the next post, we'll see how we can solve this with an example in practice.
Comments
Post a Comment