Learn from my mistakes. Getting to this screen in Fedora 23 (Linux) is a mini-nightmare.
Table of Contents
- Oh no Fedora! Something has gone wrong; A problem has occurred (with Nvidia drm, rpm and nouveau drivers with a new Fedora kernel; panic, and how I recovered my system
- Installation of NVIDIA CUDA on Fedora 23 Workstation (Linux)
- Sage Math: Installing programs on Fedora 23 (Linux)
- TeX Live install for LaTeX
- Good intentions; bad advice i.e. DON’T follow these commands carelessly in Fedora 23 (Linux)
- You don’t need to downgrade X11!!!
- Make a USB (live) boot disk of your distro
- Processor:Intel Xeon E5-1650 v3 Haswell 3.5GHz (3.8 GHz Turbo Boost) 140W 15MB L3 Cache 6 Core
- Motherboard:MSI X99A SLI PLUS LGA 2011-v3 Intel X99 SATA 6GB/s USB 3.1 USB3.0 ATX Intel Motherboard
- Memory:32GB (8x4GB)288-Pin DDR4 2133 (PC4-17000) Desktop Memory
- Power:650W – Deepcool 650W ATX12V SLI Ready CrossFire Ready 80 PLUS GOLD Certified Active PFC Power Supply
- Video Card 1: NVIDIA GeForce GTX 980 Ti 6GB 384-Bit DDR5 GDDR5 Video Card
First, I wanted to get a workstation to try out some CFD computations, parallelized to the processor(s) and GPU (hence the NVIDIA GeForce GTX 980 Ti, $649.99!), and some Deep Learning/Machine Learning computations, again parallelized out to the GPU. With the NVIDIA, I wanted to learn CUDA. Also, I wanted to build from source Sage Math (it requires a whopping 6 GB of hard drive space) and needed wanted a more capable computer to deal with building Sage Math all the time. Third, I wanted a workstation dedicated to Linux because a lot of the scientific/numerical computation programs work better/install or compile or make “better” in Linux, and I went with Fedora Linux, after doing a Google Search and reading about, more or less, “best linux distro for scientific/numerical computation (e.g. quora).
By the way, I am interested in CFD, Deep Learning/Machine Learning computations, and computational physics, and hence this new workstation, solely because I am currently “seeking opportunities in propulsion development” (i.e. I really want to work in the new companies for commercial space industry, SpaceX, Virgin Galactic, Blue Origin) and am trying to develop my skills (set) to help out in that area.
Coming back to this wordpress post, this post will be continuously updated (just like my other posts on TQFT, General Relativity, Propulsion (for aerospace stuff), and Computers; I wanted to focus on 4 main topics and collect all my writings into 4 blog posts, 1 for each topic, because I wanted to try to allow for deeper insight, than to fire off a cursory blog post, spamming followers; for instance the Computers post is a running log of various tips on programming, software, installation; Gravity has the stuff or links to, including links to my github repo, of what I pick up on GR), and it’ll link back to the Computers because this is my experience dealing with computers. So you can always easily navigate to this post from my simple menu with only 4 topics: TQFT, Gravity, Propulsion, Computers. PS. I wish wordpress had a github-like way of doing version control on blog posts and how you could Publish or push blog posts and media files from the command line, instead of the browser. I’m finding my github repositories way more easy (and fun) to update, either from the command line or browser.
Now, I was/am a sole Mac OS X/iOS user (I find myself losing my memory of how to use Windows as I haven’t used Windows in a long time; I used to edit my Windows key registry for fun) and switching to Fedora Linux so far has been a huge learning curve. I’m going to go ahead and write on tips, hints, advice, and things that I’ve learned even if they might be rudimentary or too simply (or silly) to advanced users because they were not simple to me (and hopefully to others)!
Oh no Fedora! Something has gone wrong; A problem has occurred (with Nvidia drm, rpm and nouveau drivers with a new Fedora kernel); panic, and how I recovered my system
I had Fedora 23 Workstation (Linux) up and running and with the fresh install, I first installed the NVIDIA proprietary drivers but simply following the instructions off their official website and the driver itself.
Much later in the day, Fedora asked me to upgrade, via the Software program in Activities, and I did that with dnf system upgrade.
Now when I turn on the computer, it can’t go into X i.e. the GUI and it flickers sometimes:
Taking a look at the built-in EFI boot(er) (I tried and tried again and again to reinstall off a Live USB disk, but it didn’t work because it went straight to this built-in EFI),
Fedora 4.2.3 23 is the original kernel; Fedora 4.4.8 is the offending kernel( right after it installed and restart automatically, fedora’s X graphics environment doesn’t work anymore). Either 3 options can’t load the graphics environment and I’m not sure how to check what driver or package install was bad and remove , configure and try to run again. In either of the 3 options I keep getting this until I ctrl alt f2
Also, I was receiving error messages when I booted up and couldn’t get into my X11 X (graphical) windowing environment; I was stuck at the low-resolution command line.
From my Xorg.0.log, it said
fedora linux Nvidia Failed to initialize the Nvidia kernel module please see the Nvidia system's kernel log for additional error messages and consult the NVIDIA README for details
No devices detected
Fatal server error:
no screens found(EE)
The symbol (EE) is where errors occurred in the boot up.
What happened to me has happened with other people when they use (or, in a kernel update, was switched over to)
nouveau drivers (open-source, I think?) for their NVIDIA GTX video card.
Nvidia drivers not loading correctly on Fedora 23. However, I would not follow the advice given in, for downgrading X11, nor given in that stackexchange question, respectively.
Instead, what worked for me following, to the letter, the If not true then false Fedora 23/22/21 nVidia Drivers Install Guide. This guide worked for me for reinstalling proprietary NVIDIA drivers after a conflicting kernel upgrade, accidental installing of the nouveau or nvidia-drm drivers. Go there.
I also end up back at the official NVIDIA Linux-64 bit drivers page, especially their Additional Information subpage for instructions on how to install their proprietary drivers; what helped me install the first time, and then reinstall their driver, is this page. Also, keep in mind that you can uninstall using the same command
sh ./NVIDIA-Linux-x86_64-346.35.run --uninstall
but with the uninstall flag (look it up, I forgot the exact syntax of the uninstall command).
Before those steps mentioned above, I I’ll try removing the kernel that was the last major change (it said need to install and restart upgrade and restart led to Oh no screen of death)
From that page, I did the following commands:
rpm -qa | grep ^kernel
You want to be sure that you’re not removing the current kernel you’re running:
Finally, the remove:
sudo yum remove kernel-4.4.8-300.fc23.x86_64 kernel-headers-4.4.8-300.fc23.x86_64 kernel-devel-4.4.8-300.fc23.x86_64 kernel-core-188.8.131.520.fc23.x86_64 kernel-modules-4.4.8-300.fc23.x86_64
Then I uninstalled (from the command line) and reinstalled (following the if not True then False guide) the NVidia drivers.
Finally, you’d want to things like Display video card driver version.
lspci | grep VGA
So in conclusion, my advice from my experience, and of almost losing my X11, X, startX, graphical windowing environment is to
- Be extremely careful about doing a dnf or yum system upgrade or kernel upgrade, and watch out what dependencies get installed when you do install a new program
- If you run into trouble, check dnf history to see what steps to (manually) undo
- In my case, I had to uninstall the new, offending kernel off the built-in EFI boot(er), following http://www.labtestproject.com/using_linux/remove_fedora_kernel.html
- Uninstall and reinstall the proprietary Linux driver; just follow what it says.
Whoops, the NVIDIA GTX 980 Ti did not like that last Fedora 23 upgrade and I have no idea which, in the logs, is what Fedora 23 didn’t like, and so my GUI or X (startX) isn’t starting. Unfortunately, the only thing left to do is to reinstall from a Live USB boot disk.
I did this to find out where my USB disk is on my Mac OS X:
I made a note of which
/dev/diskn number (1,2,or 3, etc. where n is, e.g. /dev/disk2) and which number (e.g. it said #2: SANDISKCRUZ and SANDISKCRUZ is the name that I named the disk when I formatted the USB stick, and so #2 it is).
After downloading the 64-bit iso I needed for Fedora, I did, e.g.
sudo dd if=Fedora_Live-Workstation-x86_64-23-10.iso of=/dev/rdisk2s2
I read that adding the ‘r’ in ‘rdisk2s2’ speeds things up.
This process took about 86 minutes on a MacBook Pro, Late-2013 (!!!). To check the status I did Ctrl-T and it gave me records in, records out, and total bytes transferred. I tried pkill and sending a signal with kill but couldn’t work that out.
cf. How to Copy an ISO to a USB Drive from Mac OS X with dd (super useful article/link); is ‘dd’ command taking too long?, Show progress of dd command (clarified many things; his experience on using dd)
Also, keep in mind the official Fedora documentation for making a Live USB:
Off to try to reinstall with this USB disk…
…And it didn’t help. I discovered on my own that Titan workstation computers keep Fedora 23 Workstation linux on the built-in EFI (EFI is like the new bootloader, newer than BIOS). No matter how many times I try to boot off the USB disk by changing the boot order or disabling the SATA disk drive, or any disk drive, the workstation directly boots to the built-in EFI boot loader, for the Fedora 23 Workstation. Aaaaaaaaaaaaaaahhhh. AAAArggggg.
See the above section, Oh no Fedora! Something has gone wrong; A problem has occurred (with Nvidia drm, rpm and nouveau drivers with a new Fedora kernel; panic, and how I recovered my system, to see how I manually, from the command line, recovered my X (graphical) environment.
Installation of NVIDIA’s CUDA Toolkit on a Fedora 23 Workstation was nontrivial; part of the reason is that it appears that 7.5 is the latest version of the CUDA Toolkit (as of 20150512), and 7.5 only supports (for sure) Fedora 21. And, this 7.5 version supports (out of the box) C compiler gcc up to version
4.* and not gcc 5. But there’s no reason why the later versions, Fedora 23 as opposed to Fedora 21, gcc 5 vs. gcc 4.*, cannot be used (because I got CUDA to work on my setup, including samples). But I found that I had to make some nontrivial symbolic linking (
I wanted to install CUDA for Udacity’s Intro to Parallel Programming, and in particular, in the very first lesson or video, Intro to the Class, for instructions on running CUDA locally, only the links to the official NVIDIA documentation were given, in particular for Linux,
But one only needs to do a Google search and read some forum posts that installing CUDA, Windows, Mac, or Linux, is highly nontrivial.
I’ll point out how I did it, and refer to the links that helped me (sometimes you simply follow, to the letter, the instructions there) and other links in which you should follow the instructions, but modify to suit your (my) system, and what NOT to do (from my experience).
Gist, short summary, steps to do (without full details), to just get CUDA to work (no graphics)
My install procedure assumes you are using the latest proprietary NVIDIA Accelerated Graphics Drivers for Linux. I removed and/or blacklisted any other open-source versions of nvidia drivers, and in particular blacklisted nouveau. See my blog post for details and description.
- Download the latest CUDA Toolkit (appears to be 7.5 as of 20160512). For my setup, I clicked on the boxes Linux for Operation System, x86_64 for Architecture, Fedora for Distribution, 21 for Version (only one there), runfile (local) for Installer Type (it was the first option that appeared). Then I modified the instructions on their webpage:
- Run `sudo sh cuda_7.5.18_linux.run`
- Follow the command-line prompts.
Instead, I did
$ sudo sh cuda_7.5.18_linux.run --override
--overrideflag to use gcc 5 so I did not have to downgrade to gcc
Here is how I selected my options at the command-line prompts (and part of the result):
$ sudo sh cuda_7.5.18_linux.run --override
Do you accept the previously read EULA? (accept/decline/quit): accept
You are attempting to install on an unsupported configuration. Do you wish to continue? ((y)es/(n)o) [ default is no ]: yes
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 352.39? ((y)es/(n)o/(q)uit): n
Install the CUDA 7.5 Toolkit? ((y)es/(n)o/(q)uit): y
Enter Toolkit Location [ default is /usr/local/cuda-7.5 ]:
Do you want to install a symbolic link at /usr/local/cuda? ((y)es/(n)o/(q)uit): y
Install the CUDA 7.5 Samples? ((y)es/(n)o/(q)uit): y
Enter CUDA Samples Location [ default is /home/[yournamehere] ]: /home/[yournamehere]/Public
Installing the CUDA Toolkit in /usr/local/cuda-7.5 ...
Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Installing the CUDA Samples in /home/[yournamehere]/ ...
Copying samples to /home/propdev/Public/NVIDIA_CUDA-7.5_Samples now...
Finished copying samples.
Again, Fedora 23 was not a supported configuration, but I wished to continue. I had already installed NVIDIA Accelerated Graphics Driver for Linux (that’s how I was seeing my X graphical environment) but it was a later version 361.* and I did not want to uninstall it and then reinstall, which was recommended by other webpages (I had already gone through the mini-nightmare of reinstalling these drivers before, which can trash your X11 environment that you depend on for a functioning GUI).
- Continuing, this was also printed out by CUDA’s installer:
Installing the CUDA Samples in /home/propdev/Public ...
Copying samples to /home/propdev/Public/NVIDIA_CUDA-7.5_Samples now...
Finished copying samples.
= Summary =
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-7.5
Samples: Installed in /home/[yournamehere]/Public, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-7.5/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-7.5/lib64, or, add /usr/local/cuda-7.5/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-7.5/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-7.5/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 352.00 is required for CUDA 7.5 functionality to work.
To install the driver using this installer, run the following command, replacing with the name of this run file:
sudo .run -silent -driver
Logfile is /tmp/cuda_install_7123.log
PATH includes /usr/local/cuda-7.5” I do
$ export PATH=/usr/local/cuda-7.5/bin:$PATH
as suggested by Chapter 6 of CUDA_Getting_Started_Linux.pdf
Dealing with the
LD_LIBRARY_PATH, I did this: I created a new text file (open up your favorite text editor) in
cuda.conf, e.g. I used emacs:
sudo emacs cuda.conf
and I pasted in the directory
(since my setup is 64-bit) into this text file. I did this because my
/etc/ld.so.conffile includes files from
/etc/ld.so.conf.d, i.e. it says
Make sure this change for `LD_LIBRARY_PATH` is made by running the command
I check the status of this “linking” to
echocommand, each time I reboot, or log back in, or start a new Terminal window:
- Patch the
To use gcc 5 instead of gcc
4.*, I needed to patch the
host_config.hheader file because I kept receiving errors. What worked for me was doing this to the file – original version:
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)
#error -- unsupported GNU version! gcc versions later than 4.9 are not supported!
#endif /* __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9) */
Commented-out version (these 3 lines)
// #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)
// #error -- unsupported GNU version! gcc versions later than 4.9 are not supported!
// #endif /* __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9) */
Afterwards, I did not have any problems with c compiler gcc incompatibility (yet).
- At this point CUDA runs without problems if no graphics capabilities are needed. For instance, as a sanity check, I ran, from the installed samples with CUDA, I made `deviceQuery` and ran it:
$ cd ~/NVIDIA_CUDA-7.5_Samples/1_Utilities/deviceQuery
$ make -j12
And then if your output looks something like this, then success!
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 980 Ti"
CUDA Driver Version / Runtime Version 8.0 / 7.5
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 6143 MBytes (6441730048 bytes)
(22) Multiprocessors, (128) CUDA Cores/MP: 2816 CUDA Cores
GPU Max Clock rate: 1076 MHz (1.08 GHz)
Memory Clock rate: 3505 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 3145728 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 980 Ti
Result = PASS
Getting the other samples to run, getting CUDA to have graphics capabilities, soft symbolic linking to the existing libraries.
The flow or general procedure I ended up having to do was to use `locate` to find the relevant `*.so.*` or `*.h` file for the missing library or missing header, respectively, and then making soft symbolic links to them with the `ln -s` command. I found that some of the samples have different configurations for in which directory the graphical libraries are (GL, GLU, X11, glut, etc.) than other samples in the samples included by NVIDIA.
To be continued, and see my github repo MLgrabbag, the README.md file, for the latest update (html code is a pain compared to markdown and I don’t want to download anymore programs to convert markdown to html (I’m already doing a lot of installing already)).
First, I tried this, following Sage Math install from source. I followed the steps until, at Step-by-Step installation procedure, General procedure, 4. Read the README.txt I read
I use emacs so I did
emacs README.md where
sage-7.1 is; in
README.md, in section More Detailed Instructions to Build from Source, I did
export MAKE="make -j14"
because my processor has 12 (I find that out by the following:
$ less /proc/cpuinfo
$ cat /proc/cpuinfo | grep processor | wc -l
and then I use more than 14 jobs for 12 processors. It took about 25-30 minutes.
However, it failed to build, again and again, even for libraries I successfully installed through Anaconda conda (from Continuum) such as git-2.6. and matplotlib. So now I am trying to follow the instructions I received from Eric Gourgoulhon (LUTH) gave me for building Sage Math from the git develop version, which I had already accounted for in my Computers post, under Starting or beginning developing (i.e. contributing code) to a major open-source project, in this case, Sage Math.
If you’re getting errors when building from github Sage Math using my and Gourgoulhon’s instructions,
Check the errors you’re receiving and the suggested log files. In the “root” directory of sage directory with the source src, there is a
logs/pkgs directory with all the logs of installed or failed packages, and in my particular case, flask_babel-0.9.log failed. Reading the log,it was a “Download Error!” So it was probably a problem with my internet connection (I’ve had problems with Time Warner Cable as a service provider for service interruptions and I cannot recommend TWC).
make again, but be sure not to overwrite the previously successful package build by typing at the command prompt of the main sage directory
Also, I was able to build from the pre-built Linux binaries:
In my experience, either following the instructions I and Eric Gourgoulhon gives, as stated in my “Computers” blog post, to build straight from the git development version, and pre-built binary, in Mac OS X and Fedora Linux, is the way to go for installing Sage Math – building from source instructions in Sage Math haven’t worked for me.
This was straightforward. I did this:
yum install texlive-scheme-full
You may be (at least I certainly was) in a rush to fix something and so you are furiously doing Google searches and searching forum posts and trying any kind of command(s) to fix the problem. But here, I collect commands NOT to do (casually).
dnf -y upgrade
dnf upgrade casually. This is because NVIDIA’s proprietary drivers may have conflict with the latest kernel. This has happened with others.
I found that I didn’t need to downgrade my X11 as advised in the article NVIDIA – Incompatible with Fedora 23 Xorg – and a Workaround.., as the latest NVIDIA drivers did just fine.
How not to replace nouveau drivers in Fedora 23
I wouldn’t do it this way (and it didn’t work for me, in the crucial step #4 of theirs, to blacklist nouveau in
/etc/modprobe.d with the commands
echo 'blacklist nouveau' >> /etc/modprobe.d/disable-nouveau.conf
echo 'nouveau modeset=0' >> /etc/modprobe.d/disable-nouveau.conf
Instead, what worked for me, again, as previously linked and written about above, is to following, to the letter, the If !1 0 Fedora 23/22/21 nVidia Drivers Install Guide.
While on that note, advice in the fedoraforum post, entitled
[SOLVED] Oh no! Something has gone wrong didn’t help me. I was thinking of trying to do a reinstall of Fedora into the built-in EFI, but this post, how to install Fedora 11 in EFI shell and GPT partition? didn’t help.
I had the same problem as described here (with a similar log), in FC22: nvidia kernel module loads, but X can’t initialize GPU, but the fix the member StefanJ proposed didn’t help in my situation.
grep processor /proc/cpuinfo
cat /proc/cpuinfo | less
Also, other system information:
cat /proc/meminfo | less
getting error “Can’t create transaction lock” with rpm
“Try running your command as root. It worked for me.” –phathutshezo