While on most occasions simple pip install tensorflow
works just fine, certain combinations of hardware may be incompatible with the repository-installed tensorflow package. In this brief tutorial I will build the latest tensorflow 2.3.1 python package from the source. This tutorial may also be helpfull for those who want to update to the latest tensorflow version on older GPUs because older hardware support was removed from the precompiled version since 2.3.0.
Prepare the building environment
Obtain the following docker container:
docker pull tensorflow/tensorflow:devel-gpu
Choose a place and create a directory that you will share with the container. In my case, I will use /home/alexandr/temp/tensorflow
. Then enter the working directory and start the docker container
cd /home/alexandr/temp/tensorflow
docker run -it -w /tensorflow_src -v $(pwd):/share tensorflow/tensorflow:devel-gpu bash
Update the repository within the container and chose the latest stable branch (at the time of writing that was 2.3)
git pull
git checkout r2.3
Next, upgrade pip and install few python dependencies
/usr/bin/python3 -m pip install --upgrade pip
pip3 install six numpy wheel keras_applications keras_preprocessing
Figure out the CPU limitations on the target machine
You need to tell the compiler what instructions to avoid in the final binaries. This is not an obvious step, as those limitations are machine specific. You may want to consult with internet and even to use some trial-and-error to see what flags are required for making the binaries stable on your particular machine.
If you are compiling on the target machine, -march=native
should suffice, as it should enable all instructions your CPU supports. If you crosscompile, as I am in this example, then you need to dig deeper.
One approach is to look at grep flags /proc/cpuinfo | head -n 1
which shows CPU features. In my case, I have a laptop where latest tensorflow works out of the box and a desktop PC with GPU where it does not. Comparing the lists from both machines, I find that the desktop PC lacks the following items: 'ida', 'bmi2', 'smep', 'rtm', 'bmi1', 'fma', 'f16c', 'hle', 'avx2', 'smx', 'adx', 'avx', 'mpx'
.
Here we can see that ida
stands for Intel Dynamic Acceleration and is a part of CPU’s Thermal and Power Management, so it is not very likely to be the breaking factor in my case. In the same vein, smep, rtm, hle, smx,
and mpx
features are unlikely to affect tensorflow execution. I could not easily find wheter f16c
and adx
are used by tensorflow binaries.
On the other hand avx
and avx2
(Advanced Vector Extensions), bmi1, bmi2
(1st/group bit manipulation extensions), and fma
(Fused multiply-add) seem to be quite important for tensorflow. Therefore, I will use the following combination of flags to build tensorflow binaries
-march=native -mno-avx -mno-avx2 -mno-fma -mno-bmi -mno-bmi2
Configure a tensorflow’s build chain
In the docker container execute
python3 configure.py
to start a configuration manager. For most questions you can just select the default answer.
One important question is about compute capability of your GPU. Since the default option may not include your GPU type, its is better to check it ahead of time here and input into the field provided. If you misspecify your GPU compute capability, then you are likely to get the following error message at an attempt to use CUDA in tensorflow:
InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
When you get to the question about optimization flags, you should input the flags we came up with in the previous section. In my case I also added a -Wno-sign-compare
flag. Once you are done, you can start the building process by running
bazel build //tensorflow/tools/pip_package:build_pip_package --local_ram_resources=16384
If bazel complains about its version it will also likely provide you with a one-liner to update it.
Build process requires a substantial amount of RAM especially on machines with many cores, so you may want to limit RAM usage by using flag --local_ram_resources=16384
. In my case I limit it to 16 GiB out of 24 GiB available on the machine. Another way of limiting resources could be restricting the number of threads to a smaller number using flag --jobs 4
.
The build will take quite a long time.
Prepare and install a python package
Execute the following command to assemble a python package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /share
Now you should be able to see a .whl
file in the mounted directory of the host machine. Copy that file to the target machine and then install with pip
. The filename suggests which python version you should use at the target machine. If you have a different version, you can use conda’s enviroments to create a separate environment for tensorflow with a required version of python.
conda create -n "tensorflow2" python=3.6
conda activate tensorflow2
pip install tensorflow-2.3.1-cp36-cp36m-linux_x86_64.whl
Now you should be able to import and use tensorflow in this new environment.
That is it!