Machine Learning (ML), Deep Learning stuff; including CUDA C/C++ stuff (utilizing and optimizing with CUDA C/C++)

A lot has already been said about Machine Learning (ML), Deep Learning, and Neural Networks.  Note that this blog post (which I’ll infrequently update) is the “mirror” to my github repository github: ernestyalumni/MLgrabbag . Go to the github repo for the most latest updates, code, and jupyter notebooks.

A few things bother me that I sought to rectify myself:

  • There ought to be a clear dictionary between the mathematical formulation, Python’s sci-kit learn, Theano, and Tensorflow implementation.  I see math equations; here’s how to implement it, immediately.  I mean, if I was in class lectures, and with the preponderance of sample data, I ought to be able to play with examples immediately.
  • Someone ought to generalize the mathematical formulation, drawing from algebra, category theory, and differential geometry/topology.
  • CPUs have been a disappointment (see actual gamer benchmarks for Kaby Lake on YouTube); everything ought to be written in parallel for the GPU.  And if you’re using a wrapper that’s almost as fast as CUDA C/C++ or about as fast as CUDA C/C++, guess what?  You ought to rewrite the thing in CUDA C/C++.

So what I’ve started doing is put up my code and notes for these courses:

The github repository MLgrabbag should have all my stuff for it.  I’m cognizant that there are already plenty of notes and solutions out there.  What I’m trying to do is to, as above,

  1. write the code in Python’s sci-kit learn and Theano, first and foremost,
  2. generalize the mathematical formulation,
  3. implement on the GPU

I think those aspects are valuable and I don’t see anyone else have either such a clear implementation or real examples (not toy examples).

GPU-accelerated Tensor Networks

Go here:


  • 20170209 Week 2 Linear Regression stuff for Coursera’s ML by Ng implemented in Python numpy, and some in Theano, see sklearn_ML.ipynb and theano_ML.ipynb, respectively.

CUDA C/C++ stuff (utilizing CUDA and optimizing CUDA C/C++ code)

cuSOLVER – Singular Value Decomposition (SVD), with and without CUDA unified memory management

I implemented simple examples illustrating Singular Value Decomposition (SVD) both with and without CUDA unified memory management, starting from the examples in the CUDA Toolkit Documentation.

Find those examples in the moreCUDA/CUSOLVER subdirectory of my CompPhys github repository.


Miscellaneous Links




One thought on “Machine Learning (ML), Deep Learning stuff; including CUDA C/C++ stuff (utilizing and optimizing with CUDA C/C++)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s