TransPy documentation

Arrange a Training Session & Training Tasks

  • Arrange a training session with us.

    • Book a session by contacting us by this email address: transpy-contact@inf.ed.ac.uk. In the training sessions, we will introduce you to our system. Write down your full name as well as your gmail address. The session will take place on Google hangout and we will invite you to join a few minutes before the start of the session. If you are having any issues, please contact us at this email address: transpy-contact@inf.ed.ac.uk
    • We will invite you to join the session you booked, please be online (log in to your gmail) 5 minutes before your booked session.
    • Please read the training tasks below and the following documentation before joining a session. You are encouraged to try these examples beforehand as well.
  • Training tasks.

    Basic:

    • Task 0: log in to the docker image of our system.

    • Task 1: Hello World. write a program that prints “Hello World!” to your computer screen.

    • Task 2: Vector-Add

      • Task 2.1: write a program that adds two numpy 1D arrays of integers element-wise using our ``transpy``’s built-in functions. For an example of inputs like [1,2,3,4] and [2,3,4,5], the output will be [3,5,7,9].
      • Task 2.2: an effective way to improve the performance of a piece of code on our system is to run the code as much as possible on accelerators. To measure how much of a piece of code gets executed on accelerators, we use this code snippet for coverage measurements. The higher the coverage is, the better. This task requires you to measure code coverage on your code developed in Task 2.1.

    Beginner:

    • Task 3: Dot Product.

      • Task 3.1: implement dot product of two numpy 1D arrays of integers using three ways: implement the dot function manually by an explicit for-loop, using numpy or transpy built-in functions. For an example of inputs like [1,2,3,4] and [2,3,4,5], the output of a dot product will be 40.
      • Task 3.2: it is recommended to use functions in our transpy library instead of hand-written code whenever possible. The functions in our transpy library are guaranteed to run on accelerators, which usually results in better performance. In this task you should measure the code coverage on your code developed in Task 3.1, and check which one of the three versions results in better code coverage.

    Intermediate:

    • Task 4: dense M x v. Use our transpy’s built-in functions to complete the following code.

      # Dense Matrix Multiply Vector
      import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
      import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
      
      dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0],
                       [0, 1, 0, 1, 0, 1, 0, 1],
                       [1, 0, 0, 0, 0, 1, 1, 1],
                       [1, 0, 1, 0, 0, 0, 0, 1],
                       [0, 0, 0, 0, 1, 0, 1, 1],
                       [0, 1, 1, 0, 0, 0, 1, 0],
                       [1, 1, 0, 0, 0, 0, 1, 1],
                       [0, 1, 1, 0, 0, 0, 1, 1]])
      vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]])
      
      # Please compute dense * vector using dot() of transpy.numpy
      # User code
        ................
      # User code
      

    The result should be [28,22,21,11,19,19,20,22]

    • Task 5: sparse M x v. Use our transpy’s built-in functions to complete the following code.

      # Sparse Matrix Multiply Vector
      import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
      import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
      
      dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0],
                       [0, 1, 0, 1, 0, 1, 0, 1],
                       [1, 0, 0, 0, 0, 1, 1, 1],
                       [1, 0, 1, 0, 0, 0, 0, 1],
                       [0, 0, 0, 0, 1, 0, 1, 1],
                       [0, 1, 1, 0, 0, 0, 1, 0],
                       [1, 1, 0, 0, 0, 0, 1, 1],
                       [0, 1, 1, 0, 0, 0, 1, 1]])
      vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]])
      
      # Please convert dense matrix to csr_matrix using sp.csr_matrix()
      # Calculte sparse * vector using np.dot()
      # User code
        ................
      # User code
      

    The result should be [28,22,21,11,19,19,20,22]

    • Task 6: Graph. Use our transpy’s built-in functions to complete the following code.

      # Graph algorithms using ``transpy.scipy.sparse.csgraph``, which is similar to ``scipy.sparse.csgraph``.
      import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
      import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
      import transpy.scipy.sparse.csgraph as csg # We change ``scipy.sparse.csgraph`` to ``transpy.scipy.sparse.csgraph``
      
      Graph = sp.csr_matrix(np.array([[0, 1, 0, 1, 1, 0, 1, 0],
                                     [0, 1, 0, 1, 0, 1, 0, 1],
                                     [1, 0, 0, 0, 0, 1, 1, 1],
                                     [1, 0, 1, 0, 0, 0, 0, 1],
                                     [0, 0, 0, 0, 1, 0, 1, 1],
                                     [0, 1, 1, 0, 0, 0, 1, 0],
                                     [1, 1, 0, 0, 0, 0, 1, 1],
                                     [0, 1, 1, 0, 0, 0, 1, 1]]))
      
      # BFS Algorithm using csg.breadth_first_tree()
      # User code
        ...........
      # User code
      

    The result should be (0, 1) 1.0 (0, 3) 1.0 (0, 4) 1.0 (0, 6) 1.0 (1, 5) 1.0 (1, 7) 1.0 (3, 2) 1.0

    • Task 7: PyTorch. Implement the following algorithm.
      1. Creates two random Tensors
      2. Multiplies them together
      3. Adds a constant value
      4. Takes the mean of the result
      5. Then, compute the gradients of the computational graph you have created through the above.
    • Task 7.1: Implement Task 7.1 using transpy.torch

    Advanced:

    • Optional Task: please complete the following neural network code using our transpy’s built-in functions. In this task, you can check the main functions whether transpy built-in functions have better code coverage.

      import transpy.numpy as np
      import transpy.profile
      
      # Initial profilling API
      pr = transpy.profile.Profile()
      
      # N is batch size; D_in is input dimension;
      # H is hidden dimension; D_out is output dimension.
      N, D_in, H, D_out = 64, 1000, 100, 10
      
      # Create random input and output data
      x = np.random.randn(N, D_in)
      y = np.random.randn(N, D_out)
      
      # Randomly initialize weights
      w1 = np.random.randn(D_in, H)
      w2 = np.random.randn(H, D_out)
      
      learning_rate = 1e-6
      
      # Insert start point you want to profilling
      pr.enable()
      for t in range(5):
          # Forward pass: compute predicted y
          h = np.dot(x,w1)
          h_relu = np.maximum(h, 0)
          # Please insert your code here,
          # calculate y_pred = n_relu * w2, please using np.dot()
          ....user code......
      
      
          # Compute and print loss
          loss = np.square(y_pred - y).sum()
          print(t, loss)
      
          # Backprop to compute gradients of w1 and w2 with respect to loss
          grad_y_pred = 2.0 * (y_pred - y)
          # Please insert your code here,
          # calculate grad_w2 = h_relu.T * grad_y_pred, please using np.dot()
          ....user code......
      
          grad_h_relu = np.dot(grad_y_pred,w2.T)
          grad_h = grad_h_relu.copy()
          grad_h[h < 0] = 0
          grad_w1 = np.dot(x.T,grad_h)
      
          # Update weights
          w1 -= learning_rate * grad_w1
          w2 -= learning_rate * grad_w2
      pr.disable()
      pr.coverage()
      pr.energy()
      

    If you want to check our solutions to the above training tasks, please click the solutions.

Three things you need to know about TransPy

1. TransPy is python.

TransPy is our library for accelerating applications on our “Transmuter” hardware. Using TransPy is easy - it’s completely based on existing python packages, including NumPy, SciPy, and PyTorch. This means that, initially, you can write regular python, using your regular tools etc, and simply remember to import TransPy.

2. How to write effective TransPy.

To get the best performance, wherever possible, express your algorithms using standard library operations which capture whole algorithms (e.g. BFS) operators which capture work with whole data structures (e.g. vector add, dot) and use our coverage checker feedback to see how well you are doing.

3. What to avoid.

In contrast, it is a bad idea, performance-wise, to code with your own explicit loops to traverse data structures, if a single operator can express the same computation. This will be obvious if our tool gives you surprisingly bad coverage feedback. For example, don’t write a triply nested loop for matrix multiply when you can call a pre-prepared library routine!

Tutorials & Getting Help

Since programming with transpy is just programming with python, numpy, scipy and pytorch, we recommend the following online materials to strengthen your understanding of these topics:

Numpy/SciPy

PyTorch

To assist during familiarization with the mechanics of our system, in addition to the discussions and examples below, we offer the following videos: Basic Transpy and profiling

We welcome feedback and queries at any time, using the Slack channels and Wiki FAQ provided by Parenthetic.

Getting up and Running

You may be provided with alternative instructions for accessing the programming environment. If so, please disregard the following instructions, and go directly to ‘Using Transpy’

Building/installing the docker image

You will receive instructions from Parenthetic detailing how to access our system. Once you are logged in, simply run the bash startEval.sh to start the docker environment

Running programs in the docker container

Using the supplied example.py program, we can run it simply by executing `` python example.py``.

Running programs in the docker container - Advanced Users

Programs can also be run directly from the remote machine, without needing the startEval.sh script. You can do this be executing the following command on the remote server: sudo docker run --privileged  -it --rm --net=host -v /data:/data   sdhph1-eval1 <your_args> <script_name>

For example, to run a python program called test.py, you would call: sudo docker run --privileged  -it --rm --net=host -v /data:/data   sdhph1-eval1 python test.py

The -v flag here copies the current working directory on the host to the /data directory in the container; this approach is useful for getting test data into the system (note that this is what the startEval.sh script does too).

The --rm flag removes the container after the process exits.

We strongly recommend getting familiar with the docker commands and understanding what’s going on here.

Please see the docker documentation for more details.

Giving docker more shared memory

In certain situations you may need to allocate more shared memory to docker. This will be manifested by python errors relating to memory allocation. To address this issue, add ``–shm-size <amount> `` to your docker command, for example:

sudo docker run --privileged  --shm-size 32G -it --rm --net=host -v /data:/data   sdhph1-eval1 <your_args> <script_name>

Using TransPy

Porting existing code

All you have to do to port existing python code using numpy, scipy, or pytorch is change the import statement.

For numpy change :

import numpy

to:

import transpy.numpy

For scipy change:

import scipy

to:

import transpy.scipy

For pytorch change:

import torch

to:

import transpy.torch

Coverage and performance estimation

This returns an estimate of transmuter execution coverage (i.e. the percentage of the overall execution which would be mapped on to our transmuter hardware) and a GOPS/Watt prediction. TransPy includes a profiling module, which you can use by running your program with -m transpy.profile.

For example, consider we have two programs. The first which is implemented using standard SciPy, and the second which is implemented using TransPy.

First:

# sci_graph.py

import numpy as np
import cProfile

from scipy.sparse import csc_matrix,csr_matrix
G = np.random.randint(0,2,size=(1000,1000))
csr = csr_matrix(G)

from scipy.sparse.csgraph import breadth_first_order
bfs_csr = breadth_first_order(csr,0)

Running this program as python3 -m transpy.profile sci_graph.py results in the following output, showing zero coverage and no performance estimation for Transmuter:

Profiling Information:
======================================================
Transmuter Coverage: 0.000
======================================================

Device Performance Estimate
======================================================
0.00 GOPS/W
======================================================

Second:

# trans_graph.py

import transpy.numpy as np
import cProfile

from transpy.scipy.sparse import csc_matrix,csr_matrix
G = np.random.randint(0,2,size=(1000,1000))
csr = csr_matrix(G)

from transpy.scipy.sparse.csgraph import breadth_first_order
bfs_csr = breadth_first_order(csr,0)

Running this program as python3 -m transpy.profile trans_graph.py results in the following output, showing coverage and a performance estimate for Transmuter:

Profiling Information:
======================================================
Transmuter Coverage: 0.829
======================================================

Device Performance Estimate:
======================================================
100.12 GOPS/W
======================================================

Focused coverage

In addition to the above profiling approach, we can also focus profiling on particular sections or lines of code. To do this, we will use the Profile class from the module transpy.profile

To begin:

import transpy.profile
p = transpy.profile.Profile()

To start profiling, we call the enable method, and the disable method to stop profiling:

p = transpy.profile.Profile()
p.enable()

a = tp.random.rand(1000,1000)
b = tp.random.rand(1000,1000)

c = a.dot(b)
p.disable()

To produce the same output as previously, we have the coverage and energy methods:

p.coverage()
p.energy()

produces

Profiling Information:
======================================================
Transmuter Coverage: 0.680
======================================================
Device Performance Estimate:
======================================================
96.953 GOPS/W
======================================================

To view detailed profiling information, use the prof_stats method:

p.prof_stats()

outputs something like:

                80 function calls (79 primitive calls) in 4.510 seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    2.960    2.960    2.960    2.960 {built-in method _transmuter.finish}
     1    1.489    1.489    1.489    1.489 {built-in method _transmuter.config}
     2    0.026    0.013    0.026    0.013 {method 'rand' of 'mtrand.RandomState' objects}
     3    0.021    0.007    0.021    0.007 {built-in method _transmuter.array_float32}
     1    0.006    0.006    0.006    0.006 {built-in method _transmuter.launch}
     1    0.004    0.004    4.482    4.482 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/core.py:157(_dot_mm)
     1    0.001    0.001    0.001    0.001 {method 'astype' of 'numpy.ndarray' objects}
     3    0.001    0.000    0.001    0.000 {built-in method _transmuter.array_int32}
     3    0.000    0.000    0.000    0.000 {built-in method numpy.array}
     1    0.000    0.000    0.000    0.000 {built-in method _transmuter.setReconfig}
     3    0.000    0.000    0.001    0.000 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:19(array_int32)
     3    0.000    0.000    0.021    0.007 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:22(array_float32)
   4/3    0.000    0.000    4.510    1.503 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/array.py:5(wrapper)
     1    0.000    0.000    4.484    4.484 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/core.py:114(dot)
     1    0.000    0.000    1.490    1.490 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:3(__init__)
     1    0.000    0.000    0.000    0.000 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/profile/core.py:33(disable)
     1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
     1    0.000    0.000    4.484    4.484 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/array.py:167(dot)
     4    0.000    0.000    0.000    0.000 {method 'view' of 'numpy.ndarray' objects}
     1    0.000    0.000    0.000    0.000 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:38(setReconfig)
     6    0.000    0.000    0.000    0.000 {built-in method _transmuter.setSchedArg}
     1    0.000    0.000    0.006    0.006 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:41(launch)
     1    0.000    0.000    2.960    2.960 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:44(finish)
     1    0.000    0.000    0.000    0.000
     /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:13(loadSchedule)

This is the same output as the built-in python cProfile; please look at the official docs.

How to debug your program (or development workflow)

First and foremost, you should eliminate any errors coming directly from your python code. Write your program using standard python, with support of the earlier mentioned packages (numpy, scipy, pytorch). Any standard python tools which you normally use for debugging will work well here.

Supported packages

Any python code that you write will work using our libraries, however only specific parts will be accelerated, therefore, you may sometimes need to tweak your code to get it to run on Transmuter. Unsupported operations will simply fall back to the standard implementation of the library.

For example, instead of writing a matrix multiplication using for loops, you would want to express your matrix multiplication using the NumPy dot operator.

NumPy

Writing NumPy programs for Transmuter is easy. You write your NumPy programs as normal, and then you just need to replace import numpy with import transpy.numpy

The interesting parts that we accelerate on Transmuter are the various operations performed on NumPy arrays. For instance, in the following example, we accelerate np.dot on Transmuter.

import numpy as np # This is the import we will change.

a = np.random.randint(0,1000,size=(8,8)) b =
np.random.randint(0,1000,size=(8,8))

c = np.dot(a,b) # This is the most time consuming operation, which we want to accelerate on Transmuter!

To perform the multiplication on Transmuter, you just need to replace the first line. The full program would look like this:

import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.

a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))

c = np.dot(a,b) # Since we changed the import, this now runs on Transmuter!

Please note, that the Matrix datatype is deprecated in NumPy, and here we follow suit. Please implement your matrices as multidimensional arrays!

NumPy Mathematical functions

Python numpy packages provide a library of functions to perform mathematical operations, such as sin(x). The following code snippet shows a typical usage of NumPy math function, which computes sin(x) on a 2-dimensional array element-wise.

import numpy as np

#initialize a random 2-dimensional array
A = np.random.rand(10,10)
B = np.sin(A)

#other example usages:
B = np.sin(1.67)
B = np.sin(np.array( [1,2,3] ))
B = np.sin(np.array( [[1,2,3],[4,5,6]] ))

Our transpy.numpy package implements part of numpy’s mathematical functions, which allows to execute those functions on our Transmuter accelerator. The following code snippet illustrates how to use our transpy.numpy package to perform mathematical operations. One only need to import transpy.numpy, the rest of code is the same as using python mathematical package.

import transpy.numpy as np

#initialize a random 2-dimensional array
A = np.random.rand(10,10)
B = np.sin(A)

#other example usages:
B = np.sin(1.67)
B = np.sin(np.array( [1,2,3] ))
B = np.sin(np.array( [[1,2,3],[4,5,6]] ))

The usage of other supported numpy math functions in our transpy.numpy package is the same as sin(x). The following table shows the list of numpy mathematical functions currently supported by our transpy.numpy package ( For the full list of the numpy mathematical functions, please follow this link ). The input data types supported are scalar (int32, float32, float64), and array with element type (int32, float32, float64)

Current NumPy math function supported by TransPy
Function Supported or not Function Supported or not
sin \(\checkmark\) logaddexp2  
cos \(\checkmark\) l0  
tan \(\checkmark\) sinc \(\checkmark\)
arcsin \(\checkmark\) signbit \(\checkmark\)
arccos \(\checkmark\) copysign  
arctan \(\checkmark\) frexp  
hypot   ldexp  
arctan2   nextafter  
degrees \(\checkmark\) spacing  
radians \(\checkmark\) lcm  
unwrap   gcd  
deg2rad \(\checkmark\) add \(\checkmark\)
rad2deg \(\checkmark\) reciprocal \(\checkmark\)
sinh \(\checkmark\) positive \(\checkmark\)
cosh \(\checkmark\) negative \(\checkmark\)
tanh \(\checkmark\) multiply \(\checkmark\)
arcsinh \(\checkmark\) divide \(\checkmark\)
arccosh \(\checkmark\) power \(\checkmark\)
arctanh \(\checkmark\) subtract \(\checkmark\)
around   true_divide  
round_   floor_divide  
rint \(\checkmark\) float_power  
fix \(\checkmark\) fmod  
floor \(\checkmark\) mod  
ceil \(\checkmark\) modf  
trunc \(\checkmark\) remainder  
prod   divmod  
sum \(\checkmark\) angle  
nanprod   real  
nansum   image  
cumprod   conj  
cumsum   convolve  
nancumprod   clip  
nancumsum   sqrt \(\checkmark\)
diff \(\checkmark\) cbrt \(\checkmark\)
ediff1d   square \(\checkmark\)
gradient   absolute \(\checkmark\)
cross   fabs \(\checkmark\)
trapz   sign \(\checkmark\)
exp \(\checkmark\) heaviside  
expm1 \(\checkmark\) maximum  
exp2 \(\checkmark\) minimum  
log \(\checkmark\) fmax  
log10 \(\checkmark\) fmin  
log2 \(\checkmark\) nan_to_num  
log1p \(\checkmark\) real_if_close  
logaddexp   interp  

SciPy

Just like NumPy, writing SciPy programs for Transmuter is easy. You write your SciPy programs as normal, and then you just need to replace import scipy with import transpy.scipy

A sparse matrix multiplication using SciPy would look like this:

import numpy as np
from scipy.sparse import csr_matrix
a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))

a_csr = csr_matrix(a)
b_csr = csr_matrix(b)

c_csr = a_csr * b_csr

To accelerate it using Transmuter, you should change the imports:

import transpy.numpy as np # use transpy.numpy here
from transpy.scipy.sparse import csr_matrix # use transpy.scipy.sparse here
a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))

a_csr = csr_matrix(a)
b_csr = csr_matrix(b)

c_csr = a_csr * b_csr

PyTorch

Our PyTorch implementation accelerates the ATen library, which is used all across PyTorch, most notably when designing neural networks. ATen is also used extensively in autograd, meaning that backward gradient computation is supported just as forward passes in your neural networks.

import torch

N = 100

x = torch.ones(N,N,requires_grad=True)

y = x + 2
z = y * y * 3
z2 = y * x + 43
z3 = z2.pow(2)

out = z3.mean()
out.backward()
print(x.grad)

To accelerate this on Transmuter do:

import transpy.torch as torch # import transpy.torch here

N = 100

x = torch.ones(N,N,requires_grad=True)

y = x + 2 # The addition is accelerated on Transmuter
z = y * y * 3 # The multiplication is accelerated on Transmuter
z2 = y * x + 43 # Both multiplication and addition are accelerated on Transmuter
z3 = z2.pow(2) # Pow is accelerated on Transmuter

out = z3.mean() # mean is accelerated on Transmuter
out.backward() # backward() calls a dozens of ATen operations, which are all accelearted on Transmuter
print(x.grad)

Please note, our environment requires that you run pytorch with a single thread! This should happen by default, but, if you’re porting any code that multithreads pytorch, make sure the following is set, before you start executing any computational code:

torch.set_num_threads(1)

Graphs

SciPy provides a library of graph algorithms, called scipy.sparse.csgraph. Graph algorithms use sparse matrix or adjacency matrix representations. To use our TransPy graph algorithms, replace import scipy.sparse.csgraph by transpy.scipy.sparse.csgraph

Graph Representation

This module uses graphs which are stored in a matrix format. A graph with N nodes can be represented by an (N x N) adjacency matrix G. If there is a connection from node i to node j, then G[i, j] = w, where w is the weight of the connection. For nodes i and j which are not connected, the value depends on the representation.

  • Compressed Sparse Row Matrix (CSR, size = N*N)
  • Compressed Sparse Column Matrix (CSC, size = N*N)
  • Adjacency Matrix( adj, size = N*N)
# Graph data preparation using `transpy.numpy` and `transpy.scipy`.
# Here, to generate random int between 0 and 1, similar to adj_matrix.
>>> import transpy.numpy as np
>>> G = np.random.randint(0,2,size=(8,8))
>>> print(G)
[[0 0 1 1 1 0 1 0]
 [1 1 0 0 0 0 1 1]
 [1 0 0 0 1 0 1 1]
 [1 1 1 1 1 0 0 0]
 [0 0 0 1 1 1 1 1]
 [1 0 0 0 1 0 1 0]
 [0 1 1 0 1 0 0 0]
 [0 0 1 1 0 1 0 1]]
>>> from transpy.scipy.sparse import csc_matrix, csr_matrix
>>> csc = csc_matrix(G)
>>> csr = csr_matrix(G)

Graph Algorithms API

The following table shows the list of graph functions currently supported by our TransPy package. ( For the full list of the scipy.sparse.csgraph graph functions, please follow this graphlink. )

Function Graph Input Type
breadth_first_order directed graph and undirected graph
breadth_first_tree directed graph and undirected graph
depth_first_order directed graph and undirected graph
depth_first_tree directed graph and undirected graph
minimum_spanning_tree undirected graph
connected_components undirected graph
shortest_path directed graph and undirected graph
dijkstra directed graph and undirected graph
pagerank directed graph and undirected graph

breadth_first_order

Return a breadth-first ordering starting with specified node.

  • breadth_first_order (graph, i_start, return_predecessors=True)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
      • i_start: int, the index of starting node
      • return_predecessors: bool, optional. If True (default), then return the predecessor array
    • Return:

      • node_array: ndarray, one dimension. The breadth-first list of nodes, starting with specified node. The length of node_array is the number of nodes reachable from the specified node.

      • predecessors: ndarray, one dimension. Return only if return_predecessors is True. The length-N list of predecessors of each node in a breadth-first tree. If node j is in the tree, then its parent is given by predecessors[j]. If node j is not in the tree ( and for the parent node) then predecessors[j]=-9999

        >>> from transpy.scipy.sparse.csgraph import breadth_first_order  # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
        >>> breadth_first_order(G,0)
        .............(TM Printing)
         (array([0, 2, 3, 4, 6, 7, 1, 5], dtype=int32), array([-9999,     3,     0,     0,     0,     4,     0,     2],
          dtype=int32))
        

breadth_first_tree

Return the tree generated by a breadth-first search. Note that a breadth-first tree from a specified node is unique.

  • breadth_first_tree (graph, i_start)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
    • i_start: int, the index of starting node

    • Returns:

      • cstree : csr matrix The N x N directed CSR of the breadth- first tree drawn from csgraph, starting at the specified node.

        >>> from transpy.scipy.sparse.csgraph import breadth_first_tree # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
        >>> tree_csr = breadth_first_tree(G,0)
        >>> tree_csr.toarray()
        array([[0., 0., 1., 1., 1., 0., 1., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 1.],
           [0., 1., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 1., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0.]])
        

depth_first_order

Return a depth-first ordering starting with specified node.

  • depth_first_order (graph, i_start, return_predecessors=True)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
      • i_start: int, the index of starting node
      • return_predecessors: bool, optional. If True (default), then return the predecessor array
    • Return:

      • node_array: ndarray, one dimension. The depth-first list of nodes, starting with specified node. The length of node_array is the number of nodes reachable from the specified node.

      • predecessors: ndarray, one dimension. Return only if return_predecessors is True. The length-N list of predecessors of each node in a depth-first tree. If node j is in the tree, then its parent is given by predecessors[j]. If node j is not in the tree ( and for the parent node) then predecessors[j]=-9999

        >>> from transpy.scipy.sparse.csgraph import depth_first_order  # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
        >>> depth_first_order(G,0)
        .............(TM Printing)
        (array([0, 2, 4, 3, 1, 6, 7, 5], dtype=int32), array([-9999,     3,     0,     4,     2,     7,     1,     1],
          dtype=int32))
        

depth_first_tree

Return a tree generated by a depth-first search. Note that a tree generated by a depth-first search is not unique: it depends on the order that the children of each node are searched.

  • depth_first_tree (graph,i_start)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
    • i_start: int, the index of starting node

    • Returns:

      • cstree : csr matrix The N x N directed CSR of the depth-first tree drawn from csgraph, starting at the specified node.

        >>> from transpy.scipy.sparse.csgraph import depth_first_tree # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
        >>> tree_csr = depth_first_tree(G,0)
        .............(TM Printing)
        >>> tree_csr.toarray()
        array([[0., 0., 1., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 1., 1.],
           [0., 0., 0., 0., 1., 0., 0., 0.],
           [0., 1., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 1., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 1., 0., 0.]])
        

minimum_spanning_tree

Return a minimum spanning tree of an undirected graph. A minimum spanning tree is a graph consisting of the subset of edges which together connect all connected nodes, while minimizing the total sum of weights on the edges. This is computed using Kruskal’s algorithm.

  • minimum_spanning_tree (graph, overwrite = False)
    • Parameters:
      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix . The N x N matrix representing an undirected graph over N nodes.
      • overwrite: bool, optional if true, then parts of the input graph will be overwritten for efficiency.
    • Returns:
      • span_tree : csr matrix The N x N compressed-sparse representation of the undirected minimum spanning tree over the input.
    • Notes: This routine uses undirected graphs as input and output. That is, if graph[i, j] and graph[j, i] are both zero, then nodes i and j do not have an edge connecting them. If either is nonzero, then the two are connected by the minimum nonzero value of the two.
>>> from transpy.scipy.sparse.csgraph import minimum_spanning_tree # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
>>> tree_csr = tcsg.minimum_spanning_tree(G)
.............(TM Printing)
>>> tree_csr.toarray()
array([[0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0],
       [0, 1, 1, 0, 1, 0, 0, 0],
       [0, 0, 1, 1, 0, 0, 0, 0]])

connected_components

Analyze the connected components of a graph

  • connected_components (graph, return_labels=False)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
      • return_labels: bool, optional. If True, then return the root labels for each of the connected components. * For directed graphs, our implementation finds strongly connected components. Nodes i and j are *strongly connected* if a path exists both from i to j and from j to i. Nodes i and j are *weakly connected* if only one of these paths exists.
    • Return:

      • n_components: int, The number of connected components

      • labels: ndarray, the length-N of root labels of the connected components It returns the root vertex for each connected components.

        # Our implementation is using ***strongly connected*** algorithm for directed graphs now.
        # For undirected graph, there is no difference between strongly connected and weakly connected.
        >>> from transpy.scipy.sparse.csgraph import connected_components  # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
        >>> connected_components(G,return_labels=True)
        .............(TM Printing)
        (1, array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32))
        

shortest_path

Perform a shortest-path search on a positive graph using Dijkstra’s algorithm.

  • shortest_path (graph, indices=None, unweighted = False)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
      • indices: int, compute the paths for the points at the given indices. * unweighted: bool, optional If True, then find unweighted distances. That is, rather than finding the path between each point such that the sum of weights is minimized, find the path such that the number of edges is minimized.
    • Return:

      • dist: ndarray, one dimension. The array of distances between graph nodes and given indices. dist[j] shortest distance from the given indices point to point j along the graph Now, if one node has infinite distance, the output is value INT_MAX(2147483647).

        >>> from transpy.scipy.sparse.csgraph import shortest_path # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
        >>> shortest_path(G,indices=0)
        .............(TM Printing)
        array([0 2 1 1 1 2 1 2], dtype=int32)
        

dijkstra

Dijkstra’s algorithm.

  • dijkstra (graph, indices=None, unweighted=False)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
      • indices: int, compute the paths for the points at the given indices. * unweighted: bool, optional If True, then find unweighted distances. That is, rather than finding the path between each point such that the sum of weights is minimized, find the path such that the number of edges is minimized.
    • Return:

      • dist: ndarray, one dimension. The array of distances between graph nodes and given indices. dist[j] shortest distance from the given indices point to point j along the graph Now, if one node has infinite distance, the output is value INT_MAX(2147483647).

        >>> import transpy.numpy as np
        >>> G = np.random.randint(0,10,size=(8,8))
        >>> G
        tparray([[5, 9, 1, 4, 7, 5, 4, 2],
         [5, 6, 9, 3, 7, 6, 6, 2],
         [6, 3, 6, 8, 6, 9, 2, 4],
         [4, 4, 0, 9, 4, 7, 0, 9],
         [6, 0, 9, 2, 2, 0, 1, 6],
         [9, 2, 6, 3, 4, 7, 2, 5],
         [1, 7, 1, 7, 2, 4, 9, 4],
         [6, 1, 4, 3, 8, 6, 5, 5]])
        >>> from transpy.scipy.sparse import csc_matrix, csr_matrix
        >>> csr = csr_matrix(G)
        >>> from transpy.scipy.sparse.csgraph import dijkstra
        >>> dijkstra(G,indices = 0, ***unweighted = False***)
        .............(TM Printing)
        Execution time (ns):
        456718
        .array([0, 3, 1, 4, 5, 5, 3, 2], dtype=int32)
        >>> dijkstra(G,indices = 0, ***unweighted = True***)
        .............(TM Printing)
        Execution time (ns):
        541917
        .array([0, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
        

pagerank

Return the PageRank of the nodes in the graph. It computes a ranking of the nodes in the graph G based on the structure of the incoming links.

  • pagerank (graph,alpha=0.85,weight=‘1’)

    • Parameters:

      • graph: “N*N”, csc matrix, csr matrix or adjacency matrix
      • alpha: float, optional. Damping parameter for PageRank, default = 0.85
      • weight: Edge data key to use as weight. TransPy currently only supports unweighted graphs for pagerank.
    • Return: ndarray, one dimension,

      • pagerank values of nodes.
    • PageRank of TransPy can stop computing with two conditions:

      • if values cannot coverage: up to max_iteration 100

      • if values can converge: for all nodes, |P_curr - P_prev| < err, stops.

        # alpha = 0,85 (default)
        # now, weight values are set to 1.
        >>> from transpy.scipy.sparse.csgraph import pagerank
        >>> pagerank(G,alpha=0.85,weight='1')
        .............(TM Printing)
        array( [0.11082783 0.10226965 0.13102683 0.12137377 0.18530285 0.07718898
         0.14524692 0.12676316], dtype=float32)
        

Examples

Linear Algebra

Dense

Dense linear algebra should be performed using NumPy.

A matrix-matrix multiplication example was shown in the NumPy section of this document.

Other NumPy functions and operators work in the same way.

For example, an element-wise matrix multiplication is performed like this:

import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.

a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))

c = a * b

Or the exp function can be used like this to square each element in the marix:

import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.

a = np.random.randint(0,1000,size=(8,8))

c = np.exp(a,2)

Sparse

Sparse linear algebra should be performed using NumPy and SciPy together.

A sparse matrix vector multiplication (SpMV) example was shown in the NumPy section of this document.

Other NumPy functions and operators work in the same way.

For example, SpMV is performed like this:

import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``

a = sp.csr_matrix(np.random.randint(0,1000,size=(8,8)))
b = np.random.randint(0,1000,size=(8,1))

c = a.dot(b)

Graphs

  • TransPy graph API

If you use any high-level graph algorithms, particularly if they are from any of the following libraries, please try the Transpy graph functions.

A graph workflow uses networkx

import networkx as nx

def get_BFS_tree(G, src):
    # networkx bfs function
    G_BFS = nx.bfs_tree(G, src)

    return G_BFS

To accelerate this on Transmuter do:

import transpy.scipy.sparse.csgraph as csg
import networkx as nx

def get_BFS_tree(G, src):
    # Transfer networkx graph to graph matrix
    scipy_G = nx.to_numpy_matrix(G)
    # TransPy graph interface
    scipy_G_BFS = csg.breadth_first_tree(scipy_G,src)
    # Transfer graph matrix to network graph format
    G_BFS = nx.Graph(scipy_G_BFS)

    return G_BFS

A graph workflow uses scipy.sparse.csgraph

import scipy.sparse.csgraph as csg
import numpy as np

def compute_row_stats(i, adj_mat_original, weighted):
    # the real distances in the graph
    true_dist_row = csg.dijkstra(adj_mat_original, indices=[i], unweighted=(not weighted), directed=False).squeeze()

    return true_dist_row

To accelerate this on Transmuter do:

import transpy.scipy.sparse.csgraph as csg # We change ``scipy.sparse.csgraph`` to ``transpy.scipy.sparse.csgraph`` here.
import transpy.numpy as np     # We change ``numpy`` to ``transpy.numpy`` here.

def compute_row_stats(i, adj_mat_original, weighted):
    # the real distances in the graph
    true_dist_row = csg.dijkstra(adj_mat_original, indices=[i], unweighted=(not weighted), directed=False).squeeze()

    return true_dist_row
  • If TransPy cannot fit your problems with existing APIs.

You can write regular python using our TransPy standard library, including transpy.numpy, transpy.scipy and transpy.pytorch. To ensure effective performance of your program, please use bulk-data operations which capture work with whole data structures (e.g., vector add, dot). You also can use our coverage checker feedback to see how well you are doing.

For example, when we try to traverse a vertex list to update values of active vertex.

import transpy.numpy as np

update_value = (1-0.35)/0.35
r = np.ones(100)
r[0:60]=0

# Active vertex array list to check active source per iteration
active_vertex = np.zeros(100)
active_vertex[20:50]=1
p = np.full((100,),1/100)

# Transpy can update neighbors of active vertices without ``for`` loop
# using our parallelism matrix multiplication.
# It equals the sequential traverse for loop :
# for idx in range(active_vertex.shape[0]):
#     if active_vertex[idx]!=0:
#        p[idx]+=update_value*r[idx]

vertex_update = update_value * r * active_vertex
p += vertex_update

If traverse a graph using CSR format, we can use matrix multiplication to seepup this process.

import transpy.numpy as np
import transpy.scipy.sparse as sp

update_value = (1-0.35)/0.35
# Graph represents as CSR format
# csr.indptr stores the source indexs
# csr.indices stores the destinations of sources
csr = sp.csr_matrix(np.random.randint(0,2,size=(100,100)))

r = np.ones(100)
r[0:60]=0

# Active vertex array list to check active source per iteration
active_vertex = np.zeros(100)
active_vertex[20:50]=1
p = np.full((100,),1/100)

upt_src = update_value * r

# Transpy can update neighbors of active vertices without ``for`` loop
# using our parallelism matrix multiplication.
# It equals the sequential traverse for loop :
# for idx in range(active_vertex.shape[0]):
#     if active_vertex[idx]!=0:
#        ngh = csr.indices[csr.indptr[idx]:csr.indptr[idx+1]]
#        for dst in ngh:
#           p[dst]+=upt_src[idx]


p += csr.T.dot(upt_src*active_vertex)

The above algorithm only needs the relationships of graph, so it should be normalised to unweighted adj_matrix (only contains 0 and 1) during using matrix multiplication. If the data of matrix is required, you do not need normalise this matrix.

About all, BFS can be written by some basic operators of TransPy using spmv

import transpy.numpy as np
import transpy.scipy.sparse as sp

#define your own graph update function

def update_func(frontier,status,bfs_list,pos):
    for i in range (0,frontier.shape[0]):
        if (frontier[i,0]!=0 and i!=bfs_list[0,0] and status[0,i]!=1):
              bfs_list[0,pos]=i
              status[0,i]=1
              pos=pos+1
    return frontier, status, bfs_list, pos

def bfs(graph,start):

    #Initial active vertex list for BFS iteration traversal
    N=graph.shape[0]
    frontier = np.zeros((N,1))

    #Active start vertex as root vertex
    frontier[start,0]=1

    #Define vertex visited status array
    #and active start vertex as visited
    status = np.zeros((1,N))
    status[0,start]=1

    #Breadth first search order list
    #define the start vertex as initial vertex
    bfs_list = np.zeros((1,N))
    bfs_list[0,0]=start

    #define the index of order_list
    pos = 1

    # Graph algorithm main loop using spmv method
    # If the source vertex i is active, frontier[i]=1
    # Use transpose graph data to multiply frontier vector
    # Get the active destination vertex list as new frontier
    # New frontier will be the input for next iteration (active vertex list)

    while (not (status==1).all()):
          # Most graph algorihtms can use Matrix multiplication or SpMV to parallelism
          frontier = sp.csr_matrix(graph.T) * frontier
          # You can change update_func() for updating vertex by yourself, e.g., shortest_path
          frontier, status, bfs_list, pos = update_func(frontier,status,bfs_list,pos)

    return bfs_list

Neural Networks - Learning PyTorch With Examples

Please note, this series of examples has been adapted from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

Warm-up: NumPy:

Before introducing PyTorch, we will first implement the network using numpy.

NumPy provides an n-dimensional array object, and many functions for manipulating these arrays. NumPy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use NumPy to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using NumPy operations:

# -*- coding: utf-8 -*-
import transpy.numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(5):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

PyTorch: Tensors

PyTorch is another great framework used for implementing modern neural networks.

Here we introduce the most fundamental PyTorch concept: the Tensor. A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Behind the scenes, Tensors can keep track of a computational graph and gradients, but they’re also useful as a generic tool for scientific computing.

Here we use PyTorch Tensors to fit a two-layer network to random data. Like the NumPy example above we need to manually implement the forward and backward passes through the network:

import transpy.torch as torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;                                                                                                                                                                     # H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(5):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

PyTorch: Tensors and autograd

In the above examples, we had to manually implement both the forward and backward passes of our neural network. Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.

Thankfully, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality. When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.

This sounds complicated, it’s pretty simple to use in practice. Each Tensor represents a node in a computational graph. If x is a Tensor that has x.requires_grad=True then x.grad is another Tensor holding the gradient of x with respect to some scalar value.

Here we use PyTorch Tensors and autograd to implement our two-layer network; now we no longer need to manually implement the backward pass through the network:

import transpy.torch as torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(5):
    # Forward pass: compute predicted y using operations on Tensors; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the a scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    # An alternative way is to operate on weight.data and weight.grad.data.
    # Recall that tensor.data gives a tensor that shares the storage with
    # tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after updating weights
        w1.grad.zero_()
        w2.grad.zero_()

PyTorch::nn

In PyTorch, the nn package serves this same purpose. The nn package defines a set of Modules, which are roughly equivalent to neural network layers. A Module receives input Tensors and computes output Tensors, but may also hold internal state such as Tensors containing learnable parameters. The nn package also defines a set of useful loss functions that are commonly used when training neural networks.

In this example we use the nn package to implement our two-layer network:

# -*- coding: utf-8 -*-
import transpy.torch as torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(5):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

PyTorch: optim

Up to this point we have updated the weights of our models by manually mutating the Tensors holding learnable parameters (with torch.no_grad() or .data to avoid tracking history in autograd). This is not a huge burden for simple optimization algorithms like stochastic gradient descent, but in practice we often train neural networks using more sophisticated optimizers like AdaGrad, RMSProp, Adam, etc.

The optim package in PyTorch abstracts the idea of an optimization algorithm and provides implementations of commonly used optimization algorithms.

In this example we will use the nn package to define our model as before, but we will optimize the model using the Adam algorithm provided by the optim package:

# -*- coding: utf-8 -*-
import transpy.torch as torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algorithms. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(5):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

Tips on improving coverage

1. Effective TransPy.

To get the best performance, wherever possible, express your algorithms using standard library operations which capture whole algorithms (e.g. BFS) operators which capture work with whole data structures (e.g. vector add, dot) and use our coverage checker feedback to see how well you are doing.

2. What to avoid.

In contrast, it is a bad idea, performance-wise, to code with your own explicit loops to traverse data structures, if a single operator can express the same computation. This will be obvious if our tool gives you surprisingly bad coverage feedback. For example, don’t write a triply nested loop for matrix multiply when you can call a pre-prepared library routine! Loops can be replaced with calls to functions like numpy’s sum or apply_along_axis, but still ensure that your code cannot be simplified further to a simpler function call.

3. Use the profiler.

The profiler will help guide your optimizations. Use it to identify hotspots in your program and to push more code into transpy calls.

Solutions to the Training Tasks

The following code is the solution to the training tasks.

Basic:

  • Task 1: Hello World.

    print("Hello World!")
    
  • Task 2: Vector-Add

    • Task 2.1:

      import transpy.numpy as np
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( a + b )
      
    • Task 2.2:

      import transpy.numpy as np
      
      # initialize profiler for code coverage measurement
      import transpy.profile
      p = transpy.profile.Profile()
      
      # mark the start of code coverage measurement
      p.enable()
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( a + b )
      
      # mark the end of code coverage measurement
      p.disable()
      
      # print the value of code coverage
      p.coverage()
      

Beginner:

  • Task 3: Dot Product.

    • Task 3.1: The three code snippets in the following implement dot product by manual for-loop, using numpy and transpy.

      import numpy as np
      
      def mydot(a, b):
          c = 0
          for i,j in zip(a,b):
              c += i*j
          return c
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( mydot(a,b) )
      
      import numpy as np
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( np.dot(a,b) )
      
      import transpy.numpy as np
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( np.dot(a,b) )
      
    • Task 3.2: The three code snippets in the following measures code coverage for the above three versions of dot product implementation, and only using transpy yeild positive coverage values as functions in transpy run on accelerators, which is the only correct way to code dot product among the three versions and in general using transpy in coding as much as possible is recommended in our system.

      import numpy as np
      
      # initialize profiler for code coverage measurement
      import transpy.profile
      p = transpy.profile.Profile()
      
      def mydot(a, b):
          c = 0
          for i,j in zip(a,b):
              c += i*j
          return c
      
      # mark the start of code coverage measurement
      p.enable()
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( mydot(a,b) )
      
      # mark the end of code coverage measurement
      p.disable()
      
      # print the value of code coverage
      p.coverage()
      
      import numpy as np
      
      # initialize profiler for code coverage measurement
      import transpy.profile
      p = transpy.profile.Profile()
      
      # mark the start of code coverage measurement
      p.enable()
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( np.dot(a,b) )
      
      # mark the end of code coverage measurement
      p.disable()
      
      # print the value of code coverage
      p.coverage()
      
      import transpy.numpy as np
      
      # initialize profiler for code coverage measurement
      import transpy.profile
      p = transpy.profile.Profile()
      
      # mark the start of code coverage measurement
      p.enable()
      
      a = np.array([1,2,3,4])
      b = np.array([2,3,4,5])
      
      print( np.dot(a,b) )
      
      # mark the end of code coverage measurement
      p.disable()
      
      # print the value of code coverage
      p.coverage()
      

Intermediate:

  • Task 4: dense M x v. User our transpy’s built-in functions to complete the following dense matrix multiply vector.

    import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
    import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
    
    dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0],
                     [0, 1, 0, 1, 0, 1, 0, 1],
                     [1, 0, 0, 0, 0, 1, 1, 1],
                     [1, 0, 1, 0, 0, 0, 0, 1],
                     [0, 0, 0, 0, 1, 0, 1, 1],
                     [0, 1, 1, 0, 0, 0, 1, 0],
                     [1, 1, 0, 0, 0, 0, 1, 1],
                     [0, 1, 1, 0, 0, 0, 1, 1]])
    vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]])
    
    # Please compute dense * vector using dot() of transpy
    # Solution for dense M x v
    dense_vector = dense.dot(vector)
    

dense_vector: [28,22,21,11,19,19,20,22]

  • Task 5: sparse M x v. Use our transpy’s built-in functions to complete the following sparse matrix multiply vector.

    import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
    import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
    
    dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0],
                     [0, 1, 0, 1, 0, 1, 0, 1],
                     [1, 0, 0, 0, 0, 1, 1, 1],
                     [1, 0, 1, 0, 0, 0, 0, 1],
                     [0, 0, 0, 0, 1, 0, 1, 1],
                     [0, 1, 1, 0, 0, 0, 1, 0],
                     [1, 1, 0, 0, 0, 0, 1, 1],
                     [0, 1, 1, 0, 0, 0, 1, 1]])
    vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]])
    
    # Please convert dense matrix to csr_matrix using sp.csr_matrix()
    # Calculte sparse * vector using np.dot()
    # Solution for sparse M x v
    sparse = sp.csr_matrix(dense)
    sparse_vector = sparse.dot(vector)
    
sparse_vector: [28,22,21,11,19,19,20,22]
  • Task 6: Graph. Use our transpy’s built-in functions to complete the following code.

    # Graph algorithms using ``transpy.scipy.sparse.csgraph``, which is similar to ``scipy.sparse.csgraph``.
    import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
    import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
    import transpy.scipy.sparse.csgraph as csg # We change ``scipy.sparse.csgraph`` to ``transpy.scipy.sparse.csgraph``
    
    Graph = sp.csr_matrix(np.array([[0, 1, 0, 1, 1, 0, 1, 0],
                                   [0, 1, 0, 1, 0, 1, 0, 1],
                                   [1, 0, 0, 0, 0, 1, 1, 1],
                                   [1, 0, 1, 0, 0, 0, 0, 1],
                                   [0, 0, 0, 0, 1, 0, 1, 1],
                                   [0, 1, 1, 0, 0, 0, 1, 0],
                                   [1, 1, 0, 0, 0, 0, 1, 1],
                                   [0, 1, 1, 0, 0, 0, 1, 1]]))
    
    # BFS Algorithm using csg.breadth_first_tree()
    # Solution for bfs
    bfs_tree = csg.breadth_first_tree(Graph,0)
    
    print(bfs_tree)
    (0, 1)       1.0
    (0, 3)       1.0
    (0, 4)       1.0
    (0, 6)       1.0
    (1, 5)       1.0
    (1, 7)       1.0
    (3, 2)       1.0
    
  • Task 7: pytorch

    import torch
    
    N = 100
    
    x = torch.Tensor(N,N)
    x.requires_grad = True
    y = torch.Tensor(N,N)
    y.requires_grad = True
    
    z = x * y
    
    z1 = z + 4
    
    out = z1.mean()
    out.backward()
    
    print(x.grad)
    print(y.grad)
    
    • Task 7.1: transpy.pytorch

      import transpy.torch as torch
      
      N = 100
      
      x = torch.Tensor(N,N)
      x.requires_grad = True
      y = torch.Tensor(N,N)
      y.requires_grad = True
      
      z = x * y
      
      z1 = z + 4
      
      out = z1.mean()
      out.backward()
      
      print(x.grad)
      print(y.grad)
      

Advanced:

  • Optional Task: please complete the following neural network code using our transpy’s built-in functions. In this task, you can check the main functions whether transpy built-in functions have better code coverage.

    import transpy.numpy as np
    import transpy.profile
    
    # Initial profilling API
    pr = transpy.profile.Profile()
    
    # N is batch size; D_in is input dimension;
    # H is hidden dimension; D_out is output dimension.
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # Create random input and output data
    x = np.random.randn(N, D_in)
    y = np.random.randn(N, D_out)
    
    # Randomly initialize weights
    w1 = np.random.randn(D_in, H)
    w2 = np.random.randn(H, D_out)
    
    learning_rate = 1e-6
    
    # Insert start point you want to profilling
    pr.enable()
    for t in range(5):
        # Forward pass: compute predicted y
        h = np.dot(x,w1)
        h_relu = np.maximum(h, 0)
        y_pred = np.dot(h_relu,w2)
    
        # Compute and print loss
        loss = np.square(y_pred - y).sum()
        print(t, loss)
    
        # Backprop to compute gradients of w1 and w2 with respect to loss
        grad_y_pred = 2.0 * (y_pred - y)
        grad_w2 = np.dot(h_relu.T,grad_y_pred)
        grad_h_relu = np.dot(grad_y_pred,w2.T)
        grad_h = grad_h_relu.copy()
        grad_h[h < 0] = 0
        grad_w1 = np.dot(x.T,grad_h)
    
        # Update weights
        w1 -= learning_rate * grad_w1
        w2 -= learning_rate * grad_w2
    
    # Insert end point you want to profilling
    pr.disable()
    pr.coverage()
    pr.energy()
    

When executing the above transpy code, a higher code coverage value should be observed compared to numpy functions. The details are similar as following:

Profiling Information:
======================================================
Transmuter Coverage: 0.785
======================================================
Device Performance Estimate:
======================================================
96.705 GOPS/W
======================================================

Indices and tables