How to build your machine learning PC
Today, machine learning can be run on almost any device. However, when it comes to training neural networks, the hardware you choose will significantly condition the necessary financial outlay, and the time it will take for the network to learn.
There are two major options when choosing hardware to train convolutional networks. On the one hand, you can use services like AWS, Azure, or Google Cloud. These services allow you to create a custom server, configuring the amount of RAM, how many CPUs you want, or what graphics card (GPU) you need. However, if you have ever calculated how much these services cost, you will have seen that they are not cheap, being able to pay almost €300 to use a server with a GPU for 24 hours. There are ways to reduce the cost of these services, such as using preemptible servers. This is not a perfect solution, since the service provider reserves the right to shut down the server if it needs to use that hardware, this makes you have to create fault-tolerant software, reducing the hardware cost on the one hand, but increasing the cost. of the software.
An alternative to using cloud services is to buy your own PC, specifically configured for machine learning. This will mean significant economic savings, since cloud services use server graphics cards that are significantly more expensive, and offer a series of services such as high availability, which do not benefit you much when training convolutional networks. . In addition, the use of your own PC will facilitate the development of software, since you will work with the data locally, and you will be able to see graphic information more easily.
In this article we explain what components you should buy to train your own models while saving money. We will choose every part of the PC without considering preconfigured computers that are usually more expensive, and the goal will be to achieve high performance using consumer hardware. All this with a budget of €3000.
The good news is that a PC only has 8 components that we must choose from. These components are GPU, CPU, RAM, hard drive, motherboard, power supply, cooling, and case.
Graphics card or GPU
The GPU is the component that will do most of the heavy computation when you train a convolutional network. Primarily, these calculations include convolutions, matrix multiplications, and activation functions.
The GPU you choose must meet three fundamental requirements. First of all, it must have enough memory to store all the information used during the training. This information includes how much each of the neural network parameters (connection weights, convolution filter values...) contributes to the analysis error. In this way, thanks to having this information stored, the training process can tune the variables of the neural network so that it works a little better in each training cycle. The second requirement is processing speed, which is typically correlated to the amount of memory the GPU includes. The processing speed determines the number of images to be processed per second. This way, a faster GPU will take less time to train, allowing you to do more experiments and increasing your productivity. Finally, deep learning libraries such as PyTorch or TensorFlow use CUDA, a framework developed by NVIDIA to perform parallel computing on graphics cards, so the graphics card you buy must be from this brand to ensure maximum compatibility.
Every two years, NVIDIA releases a series of GPUs, with the 20 series being the current family of graphics cards. Within this series, the model with the fastest memory and processing speed is the 2080 Ti with 13.45 TFLOPs (trillion operations per second) and 11 GB of GDDR6 memory. Its price is approximately €1300.
Being the most important component when it comes to training convolutional networks, this article could be extended a lot if it included graphics card comparisons. For this reason, I recommend that you pay attention to the following blog entry, where different GPUs are compared based on their performance and price.
During the training of neural networks, one of the most important functions of the CPU is to execute all the operations of the data "path" that goes from reading data on the hard disk, to moving the data to the GPU for for the convolutional network operations to be executed.
The CPU plays a key role in the training of deep learning architectures since, in many cases, it is in charge of decoding, normalizing, and performing other preprocessing techniques. This way, if the CPU is not fast enough, it will take time for data to reach the GPU, making the graphics card idle while it waits for data, and slowing down training.
The importance of the CPU increases if it is necessary to preprocess the data in each training iteration. A solution that is often used, especially when the computer does not have enough RAM to store the entire data set, is to read the data from the hard drive at each training iteration. This increases the workload of the CPU, since before executing the convolutional network the processor must convert the data to the corresponding format. If you have programmed your network with Keras, you can use the tf.data.Dataset API to optimize the data path. This API allows you to load data with the CPU while executing a convolutional network with the GPU, or parallelize the execution of preprocessing algorithms. On the other hand, TensorFlow and PyTorch use vector operations like SSE or AVX, so you must have a compatible CPU to get the most out of these libraries.
In terms of brands, the leading manufacturer in market share is Intel. This brand releases a generation of new CPUs every year, and they are currently on generation 10. Each generation has 3 main models called i3, i5, and i7 ordered from lowest to highest performance respectively. You must also be careful since each model has variants, for example, some variants of the i7 processor are i7-10700, which does not allow changing the clock frequency, and i7-10700k that does allow it. The i7-10700k processor is priced at around €440, which leaves us with a total of €1740 so far.
RAM memory is a component that you should also choose taking into account what type of model you want to train. In many cases you will find yourself with a simple model, and a small data set, which allows you to load everything into RAM. Storing all data in RAM will, in many cases, result in a reduction in the amount of time spent training, as fewer data preprocessing operations will be performed.
However, the aforementioned consumer processors have a relatively low limit on the amount of RAM they can use. Depending on the chosen processor specifications, we could not use more than 128GB of RAM on our computer. Also, if we combine this with the fact that the price of DDR4 RAM memory is above €5 per GB, you could spend a significant amount of money so that, in many cases, you have excess memory, and in other cases you lack , but is mitigated by using TensorFlow to optimize the data path.
A reasonable decision is to keep the amount of RAM in the 32-64GB range so that the price of the PC does not skyrocket (even more). For this configuration, we will use 2 16GB sticks of the Corsair Vengeance LPX DDR4 3200 model. This means an additional €170, making a total of €1,910.
The hard drive is used to save training data. Traditionally, the most frequently checked metric on hard drives was their storage capacity. This changed with the advent of solid-state hard drives, which greatly increase the speed of reading and writing data.
When you train machine learning algorithms, you have to take those two metrics into account. On the one hand, the hard drive must have enough capacity to store all the training data. But on the other hand you want to have a hard drive with a read speed fast enough to keep the GPU busy all the time. Also, you must take into account that in some cases you will use the TFRecord format to save data, which means an increase in the space occupied by the data, not only because they are duplicated, but also because you will be saving the data directly in matrix form. Finally, you have to check the speed of the hard drive you want to buy, although M.2 hard drives have grown in popularity, the fact of having an M.2 format is not synonymous with speed. Read speeds range from around 400MB/s to nearly 2.5GB/s, surpassing the access speed of first-generation DDR RAM.
In this PC we will use the Samsung 970 EVO Plus (1TB) model, which has an approximate price of €250. Thus we add €2160.
The motherboard is the component to which you connect the CPU, GPU, and RAM so that they work together. It must be Intel 10 series compatible, have a PCI express x16 slot to plug the GPU into, and run on DDR4 RAM.
Something to keep in mind is that although a PCI express slot is physically x16, inside it may be working as x8, or even 4x. This means a lower speed in data transfer and can penalize the training time, for this reason you must be careful when choosing a motherboard, and read the manufacturer's specifications well. The PCI express configuration becomes more important when you want to use multiple graphics cards, since it is more common for the second x16 slot on a motherboard to run at a slower speed.
For this PC we have chosen MSI's MPG Z490 Gaming Plus motherboard. This motherboard ticks all the boxes, uses Intel's 10-series compatible Z490 chipset, has one x16 slot working in x16 mode (although the second x16 slot works in x4 mode), and has 4 slots for DDR4 RAM. This motherboard costs €190, totaling €2,350 so far.
The power supply is responsible for providing current to the different components of the computer. The main characteristics are the output watts, the efficiency, and the number of connectors.
First of all, you must choose a source that is capable of powering all the components. The chosen GPU consumes 300W according to the manufacturer's specifications, all other components combined are in the 250-300W range. This means that you must assume a consumption of 600W, but it does not mean that you can buy a 600W source, you still have to count the efficiency, and verify that the chosen source has enough connectors to power everything.
The efficiency of the power supplies is marked with the distinctive “80 Plus” and the White, Bronze, Silver, Gold, Platinum, and Titanium variants. The efficiency of the 80 Plus sources goes from 82%, for the White models, to 94% for the Titanium. If you put yourself in the worst case, an efficiency just above 80%, the calculation results in a source of 600W/80%=750W.
Finally, not all 750W power supplies will have a sufficient number of connectors to power the equipment. The critical point of this PC is the GPU, which needs 3 PCIe connectors (6+2).
One source that ticks all the boxes is Corsair's TX750M. It offers 750W, has 4 PCIe connectors, and has an efficiency of the 80 Plus Gold type. This source costs €120, and puts the PC at €2470.
When the CPU starts pre-processing the data, its temperature will start to rise. In case the temperature exceeds 90ºC, its speed will be automatically reduced to allow it to cool down and prevent damage.
CPUs usually come with a small fan. This fan is aimed at users who will not leave the PC working for several hours in a row. However, if you train your network on a PC with this fan, it could get hot enough to reduce performance. If CPU performance drops, it could end up becoming a bottleneck, resulting in an underutilized graphics card, and lengthening training time.
An alternative is third-party heatsinks, or liquid cooling kits. These components offer higher CPU cooling performance, and make less noise. Also, if you want to overclock, a liquid cooling kit will allow you to push the performance a bit further.
For this PC we have chosen Corsair's Hydro H100x liquid cooling kit. This kit has an approximate price of €90, adding a total of €2580.
The case will store the different components of the PC. The main requirement of the case is its size, it must be compatible with the ATX motherboard format, it must be long enough to fit the graphics card, and it must have a hole for the radiator and fans. As an end user, you can also choose whether you want it to have a side window, front ports, or even sound deadening panels.
In this article we have chosen the Corsair Carbide 275Q model, it has enough space for the GPU and ventilation, in addition to having acoustic panels that will reduce the noise of the PC by a couple of decibels. This box costs €85, which means a total of 2645.
What models can you train with this PC?
The hardware we have chosen will be enough to train most networks. You will be able to train classification networks like ResNet or Xception, segmentation networks like U-Net or DeepLab, and location networks like YOLO. In general, you will be able to use "minibatches" of images up to 512x512 pixels, from this size there will come a time when not even an image will fit in the GPU memory, because all the intermediate results of the processing are saved to correct the error. Still, there will be many networks that you can train for images close to 512x512 in size, for this you will need to use a reduced batch size, and make some small adjustments, such as replacing batch normalization with batch renormalization.
There will also be problems that you cannot deal with this PC. For example, you won't be able to train a Transformer architecture from scratch, due to the large amount of data that is typically used. According to current records, to train BERT in one hour you would need 1472 NVIDIA V100 GPUs. Still, you can download a pretrained model and refine it with your dataset.