GPU accelerated tensor networks.

After participating in the Global AI Hackathon San Diego (June 23-25, 2017), where I implemented my own Python classes for Deep Neural Networks (with theano), I decided to “relax” by trying to keep abreast of the latest developments in theoretical physics by watching YouTube videos of lectures on the IHÉS channel (Institut des Hautes Études Scientifiques).

After watching Barbon’s introductory talk, I was convinced that numerical computations involving tensor networks are ripe for GPU acceleration. As a first step, I implemented the first few iterations of the construction of a matrix product state (MPS) of a 1-dim. quantum many body system – which involves applying singular value decomposition (SVD) and (dense) matrix multiplication to exponentially large matrices (2^L entries of complex double-precision numbers) – using CUDA C/C++, CUBLAS, and CUSOLVER, and with the entire computation taking place on the GPU (to eliminate slow CPU-GPU memory transfers). 2 iterations for L=16 complete in about 1.5 secs. on a nVidia GTX 980 Ti.

I’ve placed that code in this subdirectory
See files and, and verification of simple cases (before scaling up) with Python NumPy in cuSOLVERgesvd.ipynb

Tensor networks have only been developed within the last decade; 3 applications are interesting:

  • quantum many body physics: while the Hilbert space exponentially grows with the number of spins in the system, the methods of tensor networks, from MPS to so-called PEPS, which both involved applying SVD, QR decomposition, etc., reduces the state space that the system’s ground state could possibly be in. It has become a powerful tool for condensed matter physicists in the numerical simulation of quantum many-body physics problems, from high-termperature superconductors to strongly interacting ultracold atom gases.  cf. 1
  • Machine Learning: there is a use case for supervised learning and feature extraction with tensor networks. cf. 2,3
  • Quantum Gravity: wormholes (Einstein-Rosen (ER) bridge) and condensates of entangled quantum pairs (Einstein-Podolsky-Rosen (EPR) pairs) have been conjectured to be intimately connected – accumulation of a large density of EPR pairs (S>>1) seem to generate a wormhole, ER, the so-called EPR=ER relation. This relation is implied from the AdS/CFT conjecture. Tensor network representations have been applied to various entangled CFT states – large scale GPU-accelerated numerical computation of these tensor network representations and their dynamics could be useful (and unprecedented) simulations for the gravity dual (graviton) in the bulk, through AdS/CFT. cf. 4

I believe there is valuable work to be done for GPU acceleration of tensor networks. I am seeking 2 things that I am asking here for help with: 1. colleagues, advisors, mentors to collaborate with, so to obtain useful feedback 2. support, namely financial support for stipend(s), hardware, and software support (nVidia? The Simons Foundation?). Any help with meeting or placing me in contact with helpful persons would be helpful. Thanks!


  1. Ulrich Schollwoeck. The density-matrix renormalization group in the age of matrix product states. Annals of Physics 326, 96 (2011). arXiv:1008.3477 [cond-mat.str-el]
  2. Johann A. Bengua, Ho N. Phien, Hoang D. Tuan, and Minh N. D. Matrix Product State for Feature Extraction of Higher-Order Tensors. arXiv:1503.00516 [cs.CV]
  3. E. Miles Stoudenmire, David J. Schwab. Supervised Learning with Quantum-Inspired Tensor Networks. arXiv:1605.05775 [stat.ML]
  4. Juan Maldacena, Leonard Susskind. ”Cool horizons for entangled black holes.” arXiv:1306.0533 [hep-th]

Machine Learning (ML), Deep Learning stuff; including CUDA C/C++ stuff (utilizing and optimizing with CUDA C/C++)

(Incomplete) Table of Contents

  • GPU-accelerated Tensor Networks
  • “Are Neural Networks a black box?” My take.
  • Log
  • CUDA C/C++ stuff (utilizing CUDA and optimizing CUDA C/C++ code)
  • Fedora Linux installation of Docker for nVidia’s DIGITS – my experience
  • Miscellaneous Links

A lot has already been said about Machine Learning (ML), Deep Learning, and Neural Networks.  Note that this blog post (which I’ll infrequently update) is the “mirror” to my github repository github: ernestyalumni/MLgrabbag . Go to the github repo for the most latest updates, code, and jupyter notebooks.

A few things bother me that I sought to rectify myself:

  • There ought to be a clear dictionary between the mathematical formulation, Python’s sci-kit learn, Theano, and Tensorflow implementation.  I see math equations; here’s how to implement it, immediately.  I mean, if I was in class lectures, and with the preponderance of sample data, I ought to be able to play with examples immediately.
  • Someone ought to generalize the mathematical formulation, drawing from algebra, category theory, and differential geometry/topology.
  • CPUs have been a disappointment (see actual gamer benchmarks for Kaby Lake on YouTube); everything ought to be written in parallel for the GPU.  And if you’re using a wrapper that’s almost as fast as CUDA C/C++ or about as fast as CUDA C/C++, guess what?  You ought to rewrite the thing in CUDA C/C++.

So what I’ve started doing is put up my code and notes for these courses:

The github repository MLgrabbag should have all my stuff for it.  I’m cognizant that there are already plenty of notes and solutions out there.  What I’m trying to do is to, as above,

  1. write the code in Python’s sci-kit learn and Theano, first and foremost,
  2. generalize the mathematical formulation,
  3. implement on the GPU

I think those aspects are valuable and I don’t see anyone else have either such a clear implementation or real examples (not toy examples).

GPU-accelerated Tensor Networks

Go here:

Are neural networks a “black box”? My take.

I was watching a webinar HPC Exascale and AI given by Tom Gibbs for nVidia, and the first question for Q&A was whether neural networks were a “black box” or not, in that, how could anything be learned about the data presented (experimental or from simulation), if it’s unknown what neural networks do?

Here is my take on the question and how I’d push back.

For artificial neural networks (ANN), or the so-called “fully-connected layers” of Convolutional Neural Networks (CNN), Hornik, et. al. (1991) had already shown that neural networks act as a universal function approximator in that the neural networks uniformly converges to a function mapping the input data X to output y. The proof should delight pure math majors in that it employs the Stone-Weierstrass theorem. The necessary number of layers L is not known; it simply must be sufficiently large. But that a sufficiently deep neural network can converge uniformly to an approximate function that maps input data X to output y should be very comforting (and confidence-building in the technique).

For CNNs, this was an insight that struck me because I wrote a lot of incompressible Navier-Stokes equations solvers for Computational Fluid Dynamics (CFD) with finite-difference methods in CUDA C/C++: stencil operations in CUDA (or numerical computation in general) are needed for the finite-difference method for computing gradients, and further, the Hessian (second-order partial derivatives). CNNs formally do exactly these stencil operations, with the “weights” on the finite-difference being arbitrary (adjustable). Each successive convolution “layer” does a higher-order (partial) derivative from the previous; this is exactly what stencil operations for finite-difference does as well. This is also evidenced by how with each successive convolution “layer”, the total size of a block “shrinks” (if we’re not padding the boundaries), exactly as with the stencil operation for finite difference.

CNNs learn first-order and successively higher-order gradients, Hessians, partial derivatives as features from the input data. The formal mathematically structure for the whole sequence of partial derivatives over a whole set of input data are jet bundles. I would argue that this (jet bundles) should be the mathematical structure to consider for CNNs.

Nevertheless, in short, ANNs or the “fully-connected layers” was shown to be a universal function approximator for the function that maps input data X to output data y already by Hornik, et. al. (1991). CNNs are learning the gradients, and higher order derivatives associated with the image (and how the colors change across the grid) or video. They’re not as black box as a casual observer might think.



  • 20170209 Week 2 Linear Regression stuff for Coursera’s ML by Ng implemented in Python numpy, and some in Theano, see sklearn_ML.ipynb and theano_ML.ipynb, respectively.

CUDA C/C++ stuff (utilizing CUDA and optimizing CUDA C/C++ code)

cuSOLVER – Singular Value Decomposition (SVD), with and without CUDA unified memory management

I implemented simple examples illustrating Singular Value Decomposition (SVD) both with and without CUDA unified memory management, starting from the examples in the CUDA Toolkit Documentation.

Find those examples in the moreCUDA/CUSOLVER subdirectory of my CompPhys github repository.

Fedora Linux installation of Docker for nVidia’s DIGITS – my experience

I wanted to share my experience with installing Docker on Fedora Linux because I wanted to run nVidia’s DIGITS; I really want to make Docker work for Fedora Linux Workstation (23 as of today, 20170825; I will install 25 soon), but I’m having a few issues, some related to Docker, some related to Fedora:

  1. For some reason, in a user (non-admin account), when I do dnf list, I obtain the following error:
    1. ImportError: dynamic module does not define init function (PyInit__posixsubprocess)

Nevertheless, I did the following to install DIGITS:

git clone

python install


Miscellaneous Links




I will try to collect my notes and solutions on math and physics, and links to them here.

Open-source; PayPal only

From the beginning of 2016, I decided to cease all explicit crowdfunding for any of my materials on physics, math. I failed to raise any funds from previous crowdfunding efforts. I decided that if I was going to live in abundance, I must lose a scarcity attitude. I am committed to keeping all of my material open-sourced. I give all my stuff for free.

In the beginning of 2017, I received a very generous donation from a reader from Norway who found these notes useful, through PayPal. If you find these notes useful, feel free to donate directly and easily through PayPal, which won’t go through a 3rd. party such as indiegogo, kickstarter, patreon.

Otherwise, under the open-source MIT license, feel free to copy, edit, paste, make your own versions, share, use as you wish.

Algebraic Geometry

(symbolic computational) Algebraic Geometry with Sage Math on a jupyter notebook


I did a Google search for “Sage Math groebner” and I came across Martin Albrecht’s slides on “Groebner Bases” (22 October 2013).  I implemented fully on Sage Math all the topics on the slides up to the F4 algorithm.  In particular, I implemented in Sage Math/Python the generalized division algorithm, and Buchberger’s Algorithm with and without the first criterion (I did plenty of Google searches and couldn’t find someone who had a working implementation on Sage Math/Python).  Another bonus is the interactivity of having it on a jupyter notebook.  If this jupyter notebook helps yourself (reader), students/colleagues, that’d be good, as I quickly picked up the basic and foundations of using computational algebraic geometry quickly (over the weekend) from looking at the slides and working it out running Sage Math on a jupyter notebook.

I’ll update the github file as much as I can as I’m going through Cox, Little, O’Shea (2015), Ideals, Varieties, and Algorithms, and implementing what I need from there.

Differential Geometry and Differential Topology dump (DGDT_dump.tex and DGDT_dump.pdf)

I continue to take notes on differential geometry and differential topology and its relation to physics, with an emphasis on topological quantum field theory.  I dump all my note and thoughts immediately in the LaTeX and compiled pdf file here and here.  I don’t try to polish or organize these notes in any way, as I am learning at my own pace.  I’ve put this out there, with a permanent home on github, to invite any one to copy, edit, reorganize, and use these notes in anyway they’d like (the power of crowdsourcing).


20170423 update.

I have been reviewing holonomy by reading Conlon (2008), Clarke and Santoro (2012, 1206.3170 [math.DG]), and Schreiber and Waldorf (2007, 0705.0452 [math.DG]) concurrently.  I’ve already put these notes on my github repository mathphysics , in DGDT_dump.tex and DGDT_dump.pdf.






Computational Physics (CompPhys), Computational Fluid Dynamics (CFD)

I went through Ch.10 of Hjorth-Jensen (2015) and wrote up as many C++ scripts to illustrate all the (serial) PDE solvers: forward, backward Euler, Crank-Nicolson, Jacobi method.

Cpp/progs/ch10pde of CompPhys github repository

Lid-driven cavity with incompressible, viscous fluid on a 512×512 staggered grid, in CUDA C++11, with finite difference method for 2-dim., unsteady Navier-Stokes equations solver



Compare this with pp. 69 of Ch. 5, Example Applications of Griebel, Dornsheifer, Neunhoeffer.


Michael Griebel, Thomas Dornsheifer, Tilman Neunhoeffer.  Numerical Simulation in Fluid Dynamics: A Practical Introduction (Monographs on Mathematical Modeling and Computation).  SIAM.  1997.

Cantera installation tips (on Fedora Linux, namely Fedora 23 Workstation Linux)

I spent an obscene amount of time documenting my installation on Fedora 23 Workstation Linux of Cantera on my github repository subdirectory cantera_install_tips in Markdown. I’ll try copying markdown in here, in wordpress. Otherwise, go here: github:Propulsion/cantera_stuff/cantera_install_tips/

Cantera Installation Tips

Installing Cantera on Fedora Linux, straight, directly from the github repository, all the way to being compiled with scons, was nontrivial, mostly because of the installation prerequisites, which, in retrospect, can be easily installed if one knows what they are with respect to what it is in terms of Fedora/CentOS/RedHat dnf.

codename directory reference webpage (if any) Description
cantera_install_success ./ None A verbose, but complete Terminal log of cantera installation on Fedora Workstation 23 Linux, from git clone, cloning the githb repository for cantera, directly, all the way to a successful scons install.
ClassThermoPhaseExam.cpp ./ Computing Thermodynamic Properties, Class ThermoPhase, Cantera C++ Interface User’s Guide Simple, complete program creates object representing gas mixture and prints its temperature
chemeqex.cpp ./ Chemical Equilibrium Example Program, Cantera C++ Interface User’s Guide equilibrate method called to set gas to state of chemical equilibrium, holding temperature and pressure fixed.
verysimplecppprog.cpp ./

Installation Prerequisites, ala Fedora Linux, Fedora/CentOS/RedHat dnf

While Cantera mainpage’s Cantera Compilation Guide gave the packages in terms of Ubuntu/Debian’s package manager:

g++ python scons libboost-all-dev libsundials-serial-dev

and for the python module

cython python-dev python-numpy python-numpy-dev

for other Linux distributions/flavors, the same libraries have different names for different package managers and some libraries were already installed with the “stock” OS and some aren’t (as I found in my situation. For example, Cantera’s mainpage, for Ubuntu/Debian installation (compilation), it’s neglected that boost is already installed (which I found wasn’t for Fedora 23 Workstation Linux).

Installation Prerequisites for Fedora 23 Workstation Linux (make sure to do these dnf installs and installation with scons will go more smoothly).

I found that you can’t get away from dnf install on an administrator account – be sure to be on a sudo or admin account to be able to do dnf installs. Also, I found that compiling Cantera had to be done on a sudo-enabled or administrator account, in particular, access is needed to be granted to accessing root directories such as /opt/, etc. (more on that later).

Also, in general, you’d want to install the developer version of the libraries as well, usually suffixed with -devel, mostly because the header files will be placed in the right /usr/* subdirectory so to be included in the system (when compiling C++ files or installing).

  • g++ and gcc – For something else (namely CUDA Toolkit), I successfully installed, by dnf install, gcc 5, the C++ compiler that has compatibility with the new C++11/C++14 standard. The C++11 standard is necessary for compiling C++ files using Cantera (so the flag -std=c++11 is needed with g++).
  • scons – be sure to install scons – it seems like there is a push to use scons, a Python program, for installation and (package) compilation, as opposed to (old-school) CMake, or Make.
  • boostBoost is free peer-reviewed portable C++ source libraries.
sudo dnf install boost.x86_64
sudo dnf install boost-devel.x86_64
  • lapacklapack, Linear Algebra PACkage. Don’t take it for granted that lapack is already installed (I had to troubleshoot this myself, beyond the Cantera main page documentation, and find where it is). I had to install it because I found it was missing through the Cantera scons build
dnf list lapack*  # find lapack in dnf
sudo dnf install lapack.x86_64
sudo dnf install lapack-devel.x86_64
  • blasblas, Basic Linear Algebra Subprograms. Don’t take it for granted that blas is already installed (I had to troubleshoot this myself, beyond the Cantera main page documentation, and find where it is). I had to install it because I found it was missing through the Cantera scons build
dnf list blas*  # find blas in dnf
sudo dnf install blas.x86_64
sudo dnf install blas-devel.x86_64
  • python-devel – Following the spirit of how you’d want to install the developer’s version of the library concurrent with the library itself, in that you’d want the headers and symbolic links to be installed and saved onto the respective root /usr/* subdirectories (so that your system will know how to include the files), you’d want to install the Python developer’s libraries.
sudo dnf install python-devel

On this note, for Fedora Linux, I did not find with dnf list python-numpy nor python-numpy-dev which, supposedly, is found in Ubuntu/Debian – this is an example of how Fedora/CentOS/RedHat package manager is different from Ubuntu/Debian.
sundialsundial has (essential) non-linear solvers.

sudo dnf install sundials.x86_64
sudo dnf install sundials-devel.x86_64

Clean install, from git clone to scons install

git clone

scons build -j12

scons build -j12

scons build by itself is ok; I added the flag -j12 (correct me if I’m wrong) to optimize the compilation on 12 cores. So if you’re on a quad-core CPU processor, then you’d do -j4.
scons test
In my experience, if all the necessary libraries and prerequisite software are installed, then scons test should result in all tests being passed, none failed.
sudo scons install

sudo scons install

There’s no getting around not using sudo for scons install.

A successful sudo scons install should end up looking like this at the very end:

Cantera has been successfully installed.

File locations:

  applications                /usr/local/bin
  library files               /usr/local/lib64
  C++ headers                 /usr/local/include
  samples                     /usr/local/share/cantera/samples
  data files                  /usr/local/share/cantera/data 
  Python 2 package (cantera)  /usr/local/lib64/python2.7/site-packages
  Python 2 samples            /usr/local/lib64/python2.7/site-packages/cantera/examples 
  setup script                /usr/local/bin/setup_cantera

The setup script configures the environment for Cantera. It is recommended that
you run this script by typing:

  source /usr/local/bin/setup_cantera

before using Cantera, or else include its contents in your shell login script.

scons: done building targets.

Knowing where all the files were installed is good to know.

Compiling very simple C++ programs as a sanity check (that Cantera was installed)

The Cantera main page, C++ Interface User’s Guide, under Compiling Cantera C++ Programs gave the tips of using 3 ways, pkg-config, SCons, Make as ways to compile C++ programs.

However, a brief peruse of Cantera.mak, you’ll see that the flags included are daunting, numerous, and complicated:

# Required Cantera libraries
CANTERA_CORE_LIBS=-pthread -L/usr/local/lib64 -lcantera

CANTERA_CORE_LIBS_DEP = /usr/local/lib64/libcantera.a


CANTERA_CORE_FTN=-L/usr/local/lib64 -lcantera_fortran -lcantera


CANTERA_FORTRAN_SYSLIBS=-lpthread -lstdc++

#            BOOST



CANTERA_SUNDIALS_LIBS= -lsundials_cvodes -lsundials_ida -lsundials_nvecserial

Do you need sundials all the time? Does anyone (still) program in Fortran (2016)? Do we really need to include the /usr/local/lib64 directory every time? What’s the most minimal number of flags needed?

Thus, in this repository’s subdirectory, I included the simple programs that I was able to compile without a complicated Makefile such as Cantera.mak.

I found these compilation commands worked:

g++ -std=c++11 verysimplecppprog.cpp -o verysimplecppprog -lcantera -l pthread
g++ -std=c++11 chemeqex.cpp -o chemeqex -lcantera -l pthread
g++ -std=c++11 ClassThermoPhaseExam.cpp -o ClassThermoPhaseExam -lcantera -l pthread

These flags also worked, but seemed unnecessary:

g++ -std=c++11 chemeqex.cpp -o chemeqex -lcantera -L/usr/local/lib64 -lsundials_cvodes -lsundials_ida -lsundials_nvecserial -L/usr/local/lib -l pthread

Troubleshooting installation/(installation) errors that pop up

  • fatal error: Python.h: No such file or directory
fatal error: Python.h: No such file or directory
scons: *** [build/temp-py/_cantera2.os] Error 1

I found that I had to dnf install python-devel to get the header files installed onto the appropriate /usr/* root subdirectories.
scons: *** [/usr/local/include/cantera/Edge.h] /usr/local/include/cantera/Edge.h: Permission denied
Do sudo scons install
error: could not create/usr/local/lib64/python2.7′: Permission deniedDosudo scons install-scons: *** [/opt/cantera] /opt/cantera: Permission denied`

scons: *** [/opt/cantera] /opt/cantera: Permission denied
scons: building terminated because of errors.

Do sudo scons install

Troubleshooting C++ compilation/(C++ compilation) errors that pop up

I realized that I needed to include the Cantera library in this way:


when compiling with g++.
Package cantera was not found in the pkg-config search path.

Package cantera was not found in the pkg-config search path.
Perhaps you should add the directory containing `cantera.pc'
to the PKG_CONFIG_PATH environment variable
No package 'cantera' found
verysimplecppprog.cpp:9:29: fatal error: cantera/Cantera.h: No such file or directory
compilation terminated.

In my experience, I found that pkg-config, even though installed, didn’t work in compiling a simple program.
/usr/lib64/ error adding symbols: DSO missing from command line

I Google searched for this webpage:
cf. “error adding symbols: DSO missing from command line” while compiling g13-driver, ask ubuntu

From this page, I saw the use of the line LIBS = -lusb-1.0 -l pthread, and the idea of using the flag -l pthread ended up being the solution.
/usr/include/c++/5.3.1/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
You must include the -std=c++11 to use the new C++11 standard. Indeed:

/usr/include/c++/5.3.1/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
In file included from /usr/local/include/cantera/base/fmt.h:2:0,
                 from /usr/local/include/cantera/base/ctexceptions.h:14,
                 from /usr/local/include/cantera/thermo/Phase.h:12,
                 from /usr/local/include/cantera/thermo/ThermoPhase.h:14,
                 from /usr/local/include/cantera/thermo.h:12,

So you’ll have to compile like this:

g++ -std=c++11

and include this flag in Makefiles.
usr/bin/ld: cannot find -l
include the -lcantera flag in C++ compilation.

Images gallery (that may help you with your installation process; it can be daunting)

dnf list boost-*

dnf list boost

sudo dnf install boost-devel.x86_64

sudo dnf install boost-devel.x86_64

dnf list lapack*  # find lapack in dnf
sudo dnf install lapack-devel.x86_64

sudo dnf install lapack-devel.x86_64

sudo dnf install python-devel

sudo dnf install python-devel

sudo dnf install sundials.x86_64
sudo dnf install sundials-devel.x86_64

sudo dnf install sundials.x86_64

sudo dnf install sundials-devel.x86_64

git clone

git clone

fatal error: Python.h: No such file or directory
scons: *** [build/temp-py/_cantera2.os] Error 1

fatal error: Python.h: No such file or directory

Successful installation/compilation (what we want, what it should look like)

scons build


scons test

scons test

scons test success

scons test

sudo scons install

There’s no way, I found, of getting away from having to use sudo for scons install – you’ll have to be on a sudo enabled or administrator account logged in.

It troubleshoots

scons: *** [/usr/local/include/cantera/Edge.h] /usr/local/include/cantera/Edge.h: Permission denied
error: could not create `/usr/local/lib64/python2.7': Permission denied

sudo scons install

sudo scons install

sudo scons install success

sudo scons install success

Hillis/Steele and Blelloch (i.e. Prefix) scan(s) methods implemented in parallel on the GPU w/ CUDA C++11

In the subdirectory scan in Lesson Code Snippets 3 is an implementation in CUDA C++11 and C++11, with global memory, of the Hillis/Steele (inclusive) scan, Blelloch (prefix; exclusive) scan(s), each in both parallel and serial implementation.

As you can see, for large float arrays, running parallel implementations in CUDA C++11, where I used the GeForce GTX 980 Ti smokes being run serially on the CPU (I use for a CPU the Intel® Xeon(R) CPU E5-1650 v3 @ 3.50GHz × 12.


I have a thorough write up on the of my fork of Udacity’s cs344 on github.

Note that I was learning about the Hillis/Steele and Blelloch (i.e. Prefix) scan(s) methods in conjunction with Udacity’s cs344, Lesson 3 – Fundamental GPU Algorithms (Reduce, Scan, Histogram), i.e. Unit 3.. I have a writeup of the notes I took related to these scans, formulating them mathematically, on my big CompPhys.pdf, Computational Physics notes.

I accidentally `dnf update` on Fedora 23 w/ NVidia GTX 980 Ti & prop. drivers & new kernel trashed my video output for the 2nd time; here’s how I recovered my system

20161031. Note that another, similar (i.e. only a few minor changes), version of this post, in Markdown format, is on my MLGrabbag github repository, MLGrabbag

Oops.  I was on an administrator account and I accidentally ran

dnf update



dnf update , #fedoralinux #Fedora #linux I’m always so very weary about doing this, because I’ve set up my Linux setup to be as minimal (stock?) as possible with installs and dependencies. In particular, I’ve setup Fedora Linux to use the proprietary @nvidiageforce @nvidia drivers, NOT the open-source and not so good (they WILL trash your video output and get you to Fedora’s own blue screen of death) #negativo drivers. And I’ve changed around and added symbolic links manually into the root system’s collection of libraries involving #cuda, so it’ll make my C++ programming included and library inclusion at make easier. I cringe if dnf update automatically installs negativo or “accidentally” cleans up my symbolic links or breaks dependencies with CUDA.

A video posted by Ernest Yeung (@ernestyalumni) on Oct 30, 2016 at 7:49pm PDT


I had done this before and written about this before, in the post Fedora 23 workstation (Linux)+NVIDIA GeForce GTX 980 Ti: my experience, log of what I do (and find out).


I relied upon 2 webpages for the critical, almost life-saving, terminal commands to recover video output and the previous, working “good” kernel – they were such a life-saver that they’re worth repeating and I’ve saved a html copy of the 2 pages onto the MLgrabbag github repository:

See what video card is there and all kernels installed and present, respectively

lspci | grep VGA
lspci | grep -E "VGA|3D"
lspci | grep -i "VGA" 

uname -a

Remove the offending kernel that was automatically installed by dnf install

Critical commands:

rpm -qa | grep ^kernel

uname -r

sudo yum remove kernel-core-4.7.9-100.fc23.x86_64 kernel-devel-4.7.9-100.fc23.x86_64 kernel-modules-4.7.9-100.fc23.x86_64 kernel-4.7.9-100.fc23.x86_64 kernel-headers-4.7.9-100.fc23.x86_64

Install NVidia drivers to, at least, recover video output

While at the terminal prompt (in low-resolution), change to the directory where you had downloaded the NVidia drivers (hopefully it’s there somewhere already on your hard drive because you wouldn’t have web browser capability without video output):

sudo sh ./

dnf install gcc
dnf install dkms acpid
dnf install kernel-headers

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

cd /etc/sysconfig
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

dnf list xorg-x11-drv-nouveau

dnf remove xorg-x11-drv-nouveau
cd /boot

## Backup old initramfs nouveau image ##
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau20161031.img

(the last command, with the output file name, the output file’s name is arbitrary)

## Create new initramfs image ##
dracut /boot/initramfs-$(uname -r).img $(uname -r)
systemctl set-default

At this point, you’ll notice that dnf update and its subsequent removal would’ve trashed your C++ setup.

cf. stackexchange: gcc: error trying to exec 'cc1': execvp: No such file or directory When compile program with popen in php

For at this point, I tried to do a make of a C++ project I had:

[topolo@localhost MacCor1d_gfx]$ make
/usr/local/cuda/bin/nvcc -std=c++11 -g -G -Xcompiler "-Wall -Wno-deprecated-declarations" -L/usr/local/cuda/samples/common/lib/linux/x86_64 -lglut -lGL -lGLU -dc -o main.o
gcc: error trying to exec 'cc1plus': execvp: No such file or directoryMakefile:21: recipe for target 'main.o' failedmake: *** [main.o] Error 1

So you’ll have to do

dnf install gcc-c++

Might as well, while we’re at it, update NVidia proprietary drivers and CUDA Toolkit

Updating the NVidia proprietary driver – similar to installing, but remember you have to go into the low-resolution, no video driver, terminal, command line, prompt

chmod +x
systemctl set-default

systemctl set-default

Updating CUDA Toolkit (8.0)

Download CUDA Toolkit (8.0)

Then follow the instructions. If the driver is updated already, before using the “.run” installation, then choose no to installing drivers – otherwise, I had chosen yes and the default for all the options.

The Linux installation guide for CUDA Toolkit 8.0 is actually very thorough, comprehensive, and easy to use. Let’s look at the Post-Installation Actions, the Environment Setup:

The PATH variable needs to include /usr/local/cuda-8.0/bin

To add this path to the PATH variable:

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-8.0/lib64 on a 64-bit system, or /usr/local/cuda-8.0/lib on a 32-bit system

To change the environment variables for 64-bit operating systems:

    $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\

Indeed, prior to adding the PATH variable, I was getting errors when I type nvcc at the command line. After doing this:

[propdev@localhost ~]$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
[propdev@localhost ~]$ env | grep '^PATH'
[propdev@localhost ~]$ nvcc
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc fatal   : No input files specified; use option --help for more information
[propdev@localhost ~]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

I obtain what I desired – I can use nvcc at the command line.

To get the samples that use OpenGL, be sure to have glut and/or freeglut installed:

dnf install freeglut freeglut-devel

Now for some bloody reason (please let me know), the command

    $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\

still didn’t help me to allow my CUDA programs utilize the libraries in that lib64 subdirectory of the CUDA Toolkit. It seems like the programs, or the OS, wasn’t seeing the link that should be there in /usr/lib64.

What did work was in here, cannot open shared object file, with the solution at the end, from
atv, with an answer originally from txbob (most likely Robert Cravello of github)

Solved. Finally I did:

sudo echo "/usr/local/cuda-7.0/lib64" > /etc/
sudo ldconfig

Thanks a lot txbob!

This is what I did:

[root@localhost ~]# sudo echo "/usr/local/cuda-8.0/lib64" > /etc/
[root@localhost ~]# sudo ldconfig
ldconfig: /usr/local/cuda-7.5/lib64/ is not a symbolic link

and it worked; C++ programs compile with my make files.

Also, files, including in the Samples for the 8.0 Toolkit, using nvrtc compiled and worked.

Fun Nvidia video card version information, details



at the command prompt gave me this:

<br />[propdev@localhost ~]$ nvidia-smi
Mon Oct 31 15:28:30 2016       
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 980 Ti  Off  | 0000:03:00.0      On |                  N/A |
|  0%   50C    P8    22W / 275W |    423MiB /  6077MiB |      1%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    0      1349    G   /usr/libexec/Xorg                               50MiB |
|    0     19440    G   /usr/libexec/Xorg                              162MiB |
|    0     19645    G   /usr/bin/gnome-shell                           127MiB |
|    0     24621    G   /usr/libexec/Xorg                                6MiB |