Machine Learning (ML), Deep Learning stuff; including CUDA C/C++ stuff (utilizing and optimizing with CUDA C/C++)

(Incomplete) Table of Contents

  • GPU-accelerated Tensor Networks
  • “Are Neural Networks a black box?” My take.
  • Log
  • CUDA C/C++ stuff (utilizing CUDA and optimizing CUDA C/C++ code)
  • Fedora Linux installation of Docker for nVidia’s DIGITS – my experience
  • Miscellaneous Links

A lot has already been said about Machine Learning (ML), Deep Learning, and Neural Networks.  Note that this blog post (which I’ll infrequently update) is the “mirror” to my github repository github: ernestyalumni/MLgrabbag . Go to the github repo for the most latest updates, code, and jupyter notebooks.

A few things bother me that I sought to rectify myself:

  • There ought to be a clear dictionary between the mathematical formulation, Python’s sci-kit learn, Theano, and Tensorflow implementation.  I see math equations; here’s how to implement it, immediately.  I mean, if I was in class lectures, and with the preponderance of sample data, I ought to be able to play with examples immediately.
  • Someone ought to generalize the mathematical formulation, drawing from algebra, category theory, and differential geometry/topology.
  • CPUs have been a disappointment (see actual gamer benchmarks for Kaby Lake on YouTube); everything ought to be written in parallel for the GPU.  And if you’re using a wrapper that’s almost as fast as CUDA C/C++ or about as fast as CUDA C/C++, guess what?  You ought to rewrite the thing in CUDA C/C++.

So what I’ve started doing is put up my code and notes for these courses:

The github repository MLgrabbag should have all my stuff for it.  I’m cognizant that there are already plenty of notes and solutions out there.  What I’m trying to do is to, as above,

  1. write the code in Python’s sci-kit learn and Theano, first and foremost,
  2. generalize the mathematical formulation,
  3. implement on the GPU

I think those aspects are valuable and I don’t see anyone else have either such a clear implementation or real examples (not toy examples).

GPU-accelerated Tensor Networks

Go here:

Are neural networks a “black box”? My take.

I was watching a webinar HPC Exascale and AI given by Tom Gibbs for nVidia, and the first question for Q&A was whether neural networks were a “black box” or not, in that, how could anything be learned about the data presented (experimental or from simulation), if it’s unknown what neural networks do?

Here is my take on the question and how I’d push back.

For artificial neural networks (ANN), or the so-called “fully-connected layers” of Convolutional Neural Networks (CNN), Hornik, et. al. (1991) had already shown that neural networks act as a universal function approximator in that the neural networks uniformly converges to a function mapping the input data X to output y. The proof should delight pure math majors in that it employs the Stone-Weierstrass theorem. The necessary number of layers L is not known; it simply must be sufficiently large. But that a sufficiently deep neural network can converge uniformly to an approximate function that maps input data X to output y should be very comforting (and confidence-building in the technique).

For CNNs, this was an insight that struck me because I wrote a lot of incompressible Navier-Stokes equations solvers for Computational Fluid Dynamics (CFD) with finite-difference methods in CUDA C/C++: stencil operations in CUDA (or numerical computation in general) are needed for the finite-difference method for computing gradients, and further, the Hessian (second-order partial derivatives). CNNs formally do exactly these stencil operations, with the “weights” on the finite-difference being arbitrary (adjustable). Each successive convolution “layer” does a higher-order (partial) derivative from the previous; this is exactly what stencil operations for finite-difference does as well. This is also evidenced by how with each successive convolution “layer”, the total size of a block “shrinks” (if we’re not padding the boundaries), exactly as with the stencil operation for finite difference.

CNNs learn first-order and successively higher-order gradients, Hessians, partial derivatives as features from the input data. The formal mathematically structure for the whole sequence of partial derivatives over a whole set of input data are jet bundles. I would argue that this (jet bundles) should be the mathematical structure to consider for CNNs.

Nevertheless, in short, ANNs or the “fully-connected layers” was shown to be a universal function approximator for the function that maps input data X to output data y already by Hornik, et. al. (1991). CNNs are learning the gradients, and higher order derivatives associated with the image (and how the colors change across the grid) or video. They’re not as black box as a casual observer might think.



  • 20170209 Week 2 Linear Regression stuff for Coursera’s ML by Ng implemented in Python numpy, and some in Theano, see sklearn_ML.ipynb and theano_ML.ipynb, respectively.

CUDA C/C++ stuff (utilizing CUDA and optimizing CUDA C/C++ code)

cuSOLVER – Singular Value Decomposition (SVD), with and without CUDA unified memory management

I implemented simple examples illustrating Singular Value Decomposition (SVD) both with and without CUDA unified memory management, starting from the examples in the CUDA Toolkit Documentation.

Find those examples in the moreCUDA/CUSOLVER subdirectory of my CompPhys github repository.

Fedora Linux installation of Docker for nVidia’s DIGITS – my experience

I wanted to share my experience with installing Docker on Fedora Linux because I wanted to run nVidia’s DIGITS; I really want to make Docker work for Fedora Linux Workstation (23 as of today, 20170825; I will install 25 soon), but I’m having a few issues, some related to Docker, some related to Fedora:

  1. For some reason, in a user (non-admin account), when I do dnf list, I obtain the following error:
    1. ImportError: dynamic module does not define init function (PyInit__posixsubprocess)

Nevertheless, I did the following to install DIGITS:

git clone

python install


Miscellaneous Links




One thought on “Machine Learning (ML), Deep Learning stuff; including CUDA C/C++ stuff (utilizing and optimizing with CUDA C/C++)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s