I accidentally `dnf update` on Fedora 23 w/ NVidia GTX 980 Ti & prop. drivers & new kernel trashed my video output for the 2nd time; here’s how I recovered my system

20161031. Note that another, similar (i.e. only a few minor changes), version of this post, in Markdown format, is on my MLGrabbag github repository, MLGrabbag README.md

Oops.  I was on an administrator account and I accidentally ran


dnf update

<!–

–>

dnf update , #fedoralinux #Fedora #linux I’m always so very weary about doing this, because I’ve set up my Linux setup to be as minimal (stock?) as possible with installs and dependencies. In particular, I’ve setup Fedora Linux to use the proprietary @nvidiageforce @nvidia drivers, NOT the open-source and not so good (they WILL trash your video output and get you to Fedora’s own blue screen of death) #negativo drivers. And I’ve changed around and added symbolic links manually into the root system’s collection of libraries involving #cuda, so it’ll make my C++ programming included and library inclusion at make easier. I cringe if dnf update automatically installs negativo or “accidentally” cleans up my symbolic links or breaks dependencies with CUDA.

A video posted by Ernest Yeung (@ernestyalumni) on Oct 30, 2016 at 7:49pm PDT

//platform.instagram.com/en_US/embeds.js

I had done this before and written about this before, in the post Fedora 23 workstation (Linux)+NVIDIA GeForce GTX 980 Ti: my experience, log of what I do (and find out).

Fix

I relied upon 2 webpages for the critical, almost life-saving, terminal commands to recover video output and the previous, working “good” kernel – they were such a life-saver that they’re worth repeating and I’ve saved a html copy of the 2 pages onto the MLgrabbag github repository:

See what video card is there and all kernels installed and present, respectively

lspci | grep VGA
lspci | grep -E "VGA|3D"
lspci | grep -i "VGA" 

uname -a

Remove the offending kernel that was automatically installed by dnf install

Critical commands:

rpm -qa | grep ^kernel

uname -r

sudo yum remove kernel-core-4.7.9-100.fc23.x86_64 kernel-devel-4.7.9-100.fc23.x86_64 kernel-modules-4.7.9-100.fc23.x86_64 kernel-4.7.9-100.fc23.x86_64 kernel-headers-4.7.9-100.fc23.x86_64

Install NVidia drivers to, at least, recover video output

While at the terminal prompt (in low-resolution), change to the directory where you had downloaded the NVidia drivers (hopefully it’s there somewhere already on your hard drive because you wouldn’t have web browser capability without video output):

sudo sh ./NVIDIA-Linux-x86_64-361.42.run
reboot

dnf install gcc
dnf install dkms acpid
dnf install kernel-headers

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

cd /etc/sysconfig
grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

dnf list xorg-x11-drv-nouveau

dnf remove xorg-x11-drv-nouveau
cd /boot

## Backup old initramfs nouveau image ##
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau20161031.img

(the last command, with the output file name, the output file’s name is arbitrary)

## Create new initramfs image ##
dracut /boot/initramfs-$(uname -r).img $(uname -r)
systemctl set-default multi-user.target

At this point, you’ll notice that dnf update and its subsequent removal would’ve trashed your C++ setup.

cf. stackexchange: gcc: error trying to exec 'cc1': execvp: No such file or directory When compile program with popen in php

For at this point, I tried to do a make of a C++ project I had:

[topolo@localhost MacCor1d_gfx]$ make
/usr/local/cuda/bin/nvcc -std=c++11 -g -G -Xcompiler "-Wall -Wno-deprecated-declarations" -L/usr/local/cuda/samples/common/lib/linux/x86_64 -lglut -lGL -lGLU -dc main.cu -o main.o
gcc: error trying to exec 'cc1plus': execvp: No such file or directoryMakefile:21: recipe for target 'main.o' failedmake: *** [main.o] Error 1

So you’ll have to do

dnf install gcc-c++

Might as well, while we’re at it, update NVidia proprietary drivers and CUDA Toolkit

Updating the NVidia proprietary driver – similar to installing, but remember you have to go into the low-resolution, no video driver, terminal, command line, prompt

chmod +x NVIDIA-Linux-x86_64-367.57.run
systemctl set-default multi-user.target
reboot

./NVIDIA-Linux-x86_64-367.57.run
systemctl set-default graphical.target
reboot

Updating CUDA Toolkit (8.0)

Download CUDA Toolkit (8.0)

Then follow the instructions. If the driver is updated already, before using the “.run” installation, then choose no to installing drivers – otherwise, I had chosen yes and the default for all the options.

The Linux installation guide for CUDA Toolkit 8.0 is actually very thorough, comprehensive, and easy to use. Let’s look at the Post-Installation Actions, the Environment Setup:

The PATH variable needs to include /usr/local/cuda-8.0/bin

To add this path to the PATH variable:

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}

In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-8.0/lib64 on a 64-bit system, or /usr/local/cuda-8.0/lib on a 32-bit system

To change the environment variables for 64-bit operating systems:

    $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
                             ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Indeed, prior to adding the PATH variable, I was getting errors when I type nvcc at the command line. After doing this:

[propdev@localhost ~]$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
[propdev@localhost ~]$ env | grep '^PATH'
PATH=/usr/local/cuda-8.0/bin:/home/propdev/anaconda2/bin:/home/propdev/anaconda2/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/propdev/.local/bin:/home/propdev/bin
[propdev@localhost ~]$ nvcc
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc fatal   : No input files specified; use option --help for more information
[propdev@localhost ~]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

I obtain what I desired – I can use nvcc at the command line.

To get the samples that use OpenGL, be sure to have glut and/or freeglut installed:

dnf install freeglut freeglut-devel

Now for some bloody reason (please let me know), the command

    $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
                             ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

still didn’t help me to allow my CUDA programs utilize the libraries in that lib64 subdirectory of the CUDA Toolkit. It seems like the programs, or the OS, wasn’t seeing the link that should be there in /usr/lib64.

What did work was in here, libcublas.so.7.0: cannot open shared object file, with the solution at the end, from
atv, with an answer originally from txbob (most likely Robert Cravello of github)

Solved. Finally I did:

sudo echo "/usr/local/cuda-7.0/lib64" > /etc/ld.so.conf.d/cuda.conf
sudo ldconfig

Thanks a lot txbob!

This is what I did:

[root@localhost ~]# sudo echo "/usr/local/cuda-8.0/lib64" > /etc/ld.so.conf.d/cuda.conf
[root@localhost ~]# sudo ldconfig
ldconfig: /usr/local/cuda-7.5/lib64/libcudnn.so.5 is not a symbolic link

and it worked; C++ programs compile with my make files.

Also, files, including in the Samples for the 8.0 Toolkit, using nvrtc compiled and worked.

Fun Nvidia video card version information, details

Doing

nvidia-smi

at the command prompt gave me this:

<br />[propdev@localhost ~]$ nvidia-smi
Mon Oct 31 15:28:30 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 0000:03:00.0      On |                  N/A |
|  0%   50C    P8    22W / 275W |    423MiB /  6077MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1349    G   /usr/libexec/Xorg                               50MiB |
|    0     19440    G   /usr/libexec/Xorg                              162MiB |
|    0     19645    G   /usr/bin/gnome-shell                           127MiB |
|    0     24621    G   /usr/libexec/Xorg                                6MiB |
+-----------------------------------------------------------------------------+
Advertisements

One thought on “I accidentally `dnf update` on Fedora 23 w/ NVidia GTX 980 Ti & prop. drivers & new kernel trashed my video output for the 2nd time; here’s how I recovered my system

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s