TransPy documentation¶
Arrange a Training Session & Training Tasks¶
Arrange a training session with us.
- Book a session by contacting us by this email address: transpy-contact@inf.ed.ac.uk. In the training sessions, we will introduce you to our system. Write down your full name as well as your gmail address. The session will take place on Google hangout and we will invite you to join a few minutes before the start of the session. If you are having any issues, please contact us at this email address: transpy-contact@inf.ed.ac.uk
- We will invite you to join the session you booked, please be online (log in to your gmail) 5 minutes before your booked session.
- Please read the training tasks below and the following documentation before joining a session. You are encouraged to try these examples beforehand as well.
Training tasks.
Basic:
Task 0: log in to the docker image of our system.
Task 1: Hello World. write a program that prints “Hello World!” to your computer screen.
Task 2: Vector-Add
- Task 2.1: write a program that adds two
numpy1D arrays of integers element-wise using our ``transpy``’s built-in functions. For an example of inputs like [1,2,3,4] and [2,3,4,5], the output will be [3,5,7,9]. - Task 2.2: an effective way to improve the performance of a piece of code on our system is to run the code as much as possible on accelerators. To measure how much of a piece of code gets executed on accelerators, we use this code snippet for coverage measurements. The higher the coverage is, the better. This task requires you to measure code coverage on your code developed in Task 2.1.
- Task 2.1: write a program that adds two
Beginner:
Task 3: Dot Product.
- Task 3.1: implement dot product of two
numpy1D arrays of integers using three ways: implement the dot function manually by an explicit for-loop, usingnumpyortranspybuilt-in functions. For an example of inputs like [1,2,3,4] and [2,3,4,5], the output of a dot product will be 40. - Task 3.2: it is recommended to use functions in our
transpylibrary instead of hand-written code whenever possible. The functions in ourtranspylibrary are guaranteed to run on accelerators, which usually results in better performance. In this task you should measure the code coverage on your code developed in Task 3.1, and check which one of the three versions results in better code coverage.
- Task 3.1: implement dot product of two
Intermediate:
Task 4: dense M x v. Use our
transpy’s built-in functions to complete the following code.# Dense Matrix Multiply Vector import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``. import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse`` dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 1, 1, 1], [1, 0, 1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 1]]) vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]]) # Please compute dense * vector using dot() of transpy.numpy # User code ................ # User code
The result should be [28,22,21,11,19,19,20,22]
Task 5: sparse M x v. Use our
transpy’s built-in functions to complete the following code.# Sparse Matrix Multiply Vector import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``. import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse`` dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 1, 1, 1], [1, 0, 1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 1]]) vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]]) # Please convert dense matrix to csr_matrix using sp.csr_matrix() # Calculte sparse * vector using np.dot() # User code ................ # User code
The result should be [28,22,21,11,19,19,20,22]
Task 6: Graph. Use our
transpy’s built-in functions to complete the following code.# Graph algorithms using ``transpy.scipy.sparse.csgraph``, which is similar to ``scipy.sparse.csgraph``. import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``. import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse`` import transpy.scipy.sparse.csgraph as csg # We change ``scipy.sparse.csgraph`` to ``transpy.scipy.sparse.csgraph`` Graph = sp.csr_matrix(np.array([[0, 1, 0, 1, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 1, 1, 1], [1, 0, 1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 1]])) # BFS Algorithm using csg.breadth_first_tree() # User code ........... # User code
The result should be (0, 1) 1.0 (0, 3) 1.0 (0, 4) 1.0 (0, 6) 1.0 (1, 5) 1.0 (1, 7) 1.0 (3, 2) 1.0
- Task 7: PyTorch. Implement the following algorithm.
- Creates two random Tensors
- Multiplies them together
- Adds a constant value
- Takes the mean of the result
- Then, compute the gradients of the computational graph you have created through the above.
- Task 7.1: Implement Task 7.1 using transpy.torch
Advanced:
Optional Task: please complete the following neural network code using our
transpy’s built-in functions. In this task, you can check the main functions whether transpy built-in functions have better code coverage.import transpy.numpy as np import transpy.profile # Initial profilling API pr = transpy.profile.Profile() # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random input and output data x = np.random.randn(N, D_in) y = np.random.randn(N, D_out) # Randomly initialize weights w1 = np.random.randn(D_in, H) w2 = np.random.randn(H, D_out) learning_rate = 1e-6 # Insert start point you want to profilling pr.enable() for t in range(5): # Forward pass: compute predicted y h = np.dot(x,w1) h_relu = np.maximum(h, 0) # Please insert your code here, # calculate y_pred = n_relu * w2, please using np.dot() ....user code...... # Compute and print loss loss = np.square(y_pred - y).sum() print(t, loss) # Backprop to compute gradients of w1 and w2 with respect to loss grad_y_pred = 2.0 * (y_pred - y) # Please insert your code here, # calculate grad_w2 = h_relu.T * grad_y_pred, please using np.dot() ....user code...... grad_h_relu = np.dot(grad_y_pred,w2.T) grad_h = grad_h_relu.copy() grad_h[h < 0] = 0 grad_w1 = np.dot(x.T,grad_h) # Update weights w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2 pr.disable() pr.coverage() pr.energy()
If you want to check our solutions to the above training tasks, please click the solutions.
Three things you need to know about TransPy¶
1. TransPy is python.
TransPy is our library for accelerating applications on our “Transmuter” hardware. Using TransPy is easy - it’s completely based on existing python packages, including NumPy, SciPy, and PyTorch. This means that, initially, you can write regular python, using your regular tools etc, and simply remember to import TransPy.
2. How to write effective TransPy.
To get the best performance, wherever possible, express your algorithms using standard library operations which capture whole algorithms (e.g. BFS) operators which capture work with whole data structures (e.g. vector add, dot) and use our coverage checker feedback to see how well you are doing.
3. What to avoid.
In contrast, it is a bad idea, performance-wise, to code with your own explicit loops to traverse data structures, if a single operator can express the same computation. This will be obvious if our tool gives you surprisingly bad coverage feedback. For example, don’t write a triply nested loop for matrix multiply when you can call a pre-prepared library routine!
Tutorials & Getting Help¶
Since programming with transpy is just programming with python, numpy, scipy and pytorch, we recommend the following online materials to strengthen your understanding of these topics:
Numpy/SciPy
To assist during familiarization with the mechanics of our system, in addition to the discussions and examples below, we offer the following videos: Basic Transpy and profiling
We welcome feedback and queries at any time, using the Slack channels and Wiki FAQ provided by Parenthetic.
Getting up and Running¶
You may be provided with alternative instructions for accessing the programming environment. If so, please disregard the following instructions, and go directly to ‘Using Transpy’
Building/installing the docker image¶
You will receive instructions from Parenthetic detailing how to access our system. Once you are logged in, simply run the bash startEval.sh to start the docker environment
Running programs in the docker container¶
Using the supplied example.py program, we can run it simply by executing `` python example.py``.
Running programs in the docker container - Advanced Users¶
Programs can also be run directly from the remote machine, without needing the startEval.sh script. You can do this be executing the following command on the remote server:
sudo docker run --privileged -it --rm --net=host -v /data:/data sdhph1-eval1 <your_args> <script_name>
For example, to run a python program called test.py, you would call:
sudo docker run --privileged -it --rm --net=host -v /data:/data sdhph1-eval1 python test.py
The -v flag here copies the current working directory on the host to
the /data directory in the container; this approach is useful for
getting test data into the system (note that this is what the
startEval.sh script does too).
The --rm flag removes the container after the process exits.
We strongly recommend getting familiar with the docker commands and understanding what’s going on here.
Please see the docker documentation for more details.
Using TransPy¶
Porting existing code¶
All you have to do to port existing python code using numpy, scipy, or
pytorch is change the import statement.
For numpy change :
import numpy
to:
import transpy.numpy
For scipy change:
import scipy
to:
import transpy.scipy
For pytorch change:
import torch
to:
import transpy.torch
Coverage and performance estimation¶
This returns an estimate of transmuter execution coverage (i.e. the percentage of the overall execution which would be mapped on to our transmuter hardware) and a GOPS/Watt prediction.
TransPy includes a profiling module, which you can use by running your program with -m transpy.profile.
For example, consider we have two programs. The first which is implemented using standard SciPy, and the second which is implemented using TransPy.
First:
# sci_graph.py
import numpy as np
import cProfile
from scipy.sparse import csc_matrix,csr_matrix
G = np.random.randint(0,2,size=(1000,1000))
csr = csr_matrix(G)
from scipy.sparse.csgraph import breadth_first_order
bfs_csr = breadth_first_order(csr,0)
Running this program as python3 -m transpy.profile sci_graph.py results in the following output, showing zero coverage and no performance estimation for Transmuter:
Profiling Information:
======================================================
Transmuter Coverage: 0.000
======================================================
Device Performance Estimate
======================================================
0.00 GOPS/W
======================================================
Second:
# trans_graph.py
import transpy.numpy as np
import cProfile
from transpy.scipy.sparse import csc_matrix,csr_matrix
G = np.random.randint(0,2,size=(1000,1000))
csr = csr_matrix(G)
from transpy.scipy.sparse.csgraph import breadth_first_order
bfs_csr = breadth_first_order(csr,0)
Running this program as python3 -m transpy.profile trans_graph.py results in the following output, showing coverage and a performance estimate for Transmuter:
Profiling Information:
======================================================
Transmuter Coverage: 0.829
======================================================
Device Performance Estimate:
======================================================
100.12 GOPS/W
======================================================
Focused coverage¶
In addition to the above profiling approach, we can also focus profiling on
particular sections or lines of code. To do this, we will use the Profile
class from the module transpy.profile
To begin:
import transpy.profile
p = transpy.profile.Profile()
To start profiling, we call the enable method, and the disable method to
stop profiling:
p = transpy.profile.Profile()
p.enable()
a = tp.random.rand(1000,1000)
b = tp.random.rand(1000,1000)
c = a.dot(b)
p.disable()
To produce the same output as previously, we have the coverage and
energy methods:
p.coverage()
p.energy()
produces
Profiling Information:
======================================================
Transmuter Coverage: 0.680
======================================================
Device Performance Estimate:
======================================================
96.953 GOPS/W
======================================================
To view detailed profiling information, use the prof_stats method:
p.prof_stats()
outputs something like:
80 function calls (79 primitive calls) in 4.510 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.960 2.960 2.960 2.960 {built-in method _transmuter.finish}
1 1.489 1.489 1.489 1.489 {built-in method _transmuter.config}
2 0.026 0.013 0.026 0.013 {method 'rand' of 'mtrand.RandomState' objects}
3 0.021 0.007 0.021 0.007 {built-in method _transmuter.array_float32}
1 0.006 0.006 0.006 0.006 {built-in method _transmuter.launch}
1 0.004 0.004 4.482 4.482 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/core.py:157(_dot_mm)
1 0.001 0.001 0.001 0.001 {method 'astype' of 'numpy.ndarray' objects}
3 0.001 0.000 0.001 0.000 {built-in method _transmuter.array_int32}
3 0.000 0.000 0.000 0.000 {built-in method numpy.array}
1 0.000 0.000 0.000 0.000 {built-in method _transmuter.setReconfig}
3 0.000 0.000 0.001 0.000 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:19(array_int32)
3 0.000 0.000 0.021 0.007 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:22(array_float32)
4/3 0.000 0.000 4.510 1.503 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/array.py:5(wrapper)
1 0.000 0.000 4.484 4.484 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/core.py:114(dot)
1 0.000 0.000 1.490 1.490 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:3(__init__)
1 0.000 0.000 0.000 0.000 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/profile/core.py:33(disable)
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1 0.000 0.000 4.484 4.484 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transpy/numpy/array.py:167(dot)
4 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:38(setReconfig)
6 0.000 0.000 0.000 0.000 {built-in method _transmuter.setSchedArg}
1 0.000 0.000 0.006 0.006 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:41(launch)
1 0.000 0.000 2.960 2.960 /Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:44(finish)
1 0.000 0.000 0.000 0.000
/Users/magnus/.local/share/virtualenvs/transpyszrdsPQS/lib/python3.7/site-packages/transmuter/core.py:13(loadSchedule)
This is the same output as the built-in python cProfile; please look at the
official docs.
How to debug your program (or development workflow)¶
First and foremost, you should eliminate any errors coming directly from your python code.
Write your program using standard python, with support of the earlier mentioned
packages (numpy, scipy, pytorch). Any standard python tools which you normally use for
debugging will work well here.
Supported packages¶
Any python code that you write will work using our libraries, however only specific parts will be accelerated, therefore, you may sometimes need to tweak your code to get it to run on Transmuter. Unsupported operations will simply fall back to the standard implementation of the library.
For example, instead of writing a matrix multiplication using for loops, you would
want to express your matrix multiplication using the NumPy dot operator.
NumPy¶
Writing NumPy programs for Transmuter is easy. You write your NumPy
programs as normal, and then you just need to replace import numpy
with import transpy.numpy
The interesting parts that we accelerate on Transmuter are the various operations performed on NumPy arrays. For instance, in the following example, we accelerate np.dot on Transmuter.
import numpy as np # This is the import we will change.
a = np.random.randint(0,1000,size=(8,8)) b =
np.random.randint(0,1000,size=(8,8))
c = np.dot(a,b) # This is the most time consuming operation, which we want to accelerate on Transmuter!
To perform the multiplication on Transmuter, you just need to replace the first line. The full program would look like this:
import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.
a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))
c = np.dot(a,b) # Since we changed the import, this now runs on Transmuter!
Please note, that the Matrix datatype is deprecated in NumPy, and
here we follow suit. Please implement your matrices as multidimensional
arrays!
NumPy Mathematical functions¶
Python numpy packages provide a library of functions to perform mathematical operations,
such as sin(x). The following code snippet shows a typical usage
of NumPy math function, which computes sin(x) on a 2-dimensional array element-wise.
import numpy as np #initialize a random 2-dimensional array A = np.random.rand(10,10) B = np.sin(A) #other example usages: B = np.sin(1.67) B = np.sin(np.array( [1,2,3] )) B = np.sin(np.array( [[1,2,3],[4,5,6]] ))
Our transpy.numpy package implements part of numpy’s mathematical functions,
which allows to execute those functions on our Transmuter accelerator.
The following code snippet illustrates how to use our transpy.numpy package
to perform mathematical operations.
One only need to import transpy.numpy,
the rest of code is the same as using python mathematical package.
import transpy.numpy as np #initialize a random 2-dimensional array A = np.random.rand(10,10) B = np.sin(A) #other example usages: B = np.sin(1.67) B = np.sin(np.array( [1,2,3] )) B = np.sin(np.array( [[1,2,3],[4,5,6]] ))
The usage of other supported numpy math functions in our transpy.numpy package
is the same as sin(x).
The following table shows the list of numpy mathematical functions currently supported by our transpy.numpy package
( For the full list of the numpy mathematical functions,
please follow this link ).
The input data types supported are scalar (int32, float32, float64),
and array with element type (int32, float32, float64)
| Function | Supported or not | Function | Supported or not |
|---|---|---|---|
| sin | \(\checkmark\) | logaddexp2 | |
| cos | \(\checkmark\) | l0 | |
| tan | \(\checkmark\) | sinc | \(\checkmark\) |
| arcsin | \(\checkmark\) | signbit | \(\checkmark\) |
| arccos | \(\checkmark\) | copysign | |
| arctan | \(\checkmark\) | frexp | |
| hypot | ldexp | ||
| arctan2 | nextafter | ||
| degrees | \(\checkmark\) | spacing | |
| radians | \(\checkmark\) | lcm | |
| unwrap | gcd | ||
| deg2rad | \(\checkmark\) | add | \(\checkmark\) |
| rad2deg | \(\checkmark\) | reciprocal | \(\checkmark\) |
| sinh | \(\checkmark\) | positive | \(\checkmark\) |
| cosh | \(\checkmark\) | negative | \(\checkmark\) |
| tanh | \(\checkmark\) | multiply | \(\checkmark\) |
| arcsinh | \(\checkmark\) | divide | \(\checkmark\) |
| arccosh | \(\checkmark\) | power | \(\checkmark\) |
| arctanh | \(\checkmark\) | subtract | \(\checkmark\) |
| around | true_divide | ||
| round_ | floor_divide | ||
| rint | \(\checkmark\) | float_power | |
| fix | \(\checkmark\) | fmod | |
| floor | \(\checkmark\) | mod | |
| ceil | \(\checkmark\) | modf | |
| trunc | \(\checkmark\) | remainder | |
| prod | divmod | ||
| sum | \(\checkmark\) | angle | |
| nanprod | real | ||
| nansum | image | ||
| cumprod | conj | ||
| cumsum | convolve | ||
| nancumprod | clip | ||
| nancumsum | sqrt | \(\checkmark\) | |
| diff | \(\checkmark\) | cbrt | \(\checkmark\) |
| ediff1d | square | \(\checkmark\) | |
| gradient | absolute | \(\checkmark\) | |
| cross | fabs | \(\checkmark\) | |
| trapz | sign | \(\checkmark\) | |
| exp | \(\checkmark\) | heaviside | |
| expm1 | \(\checkmark\) | maximum | |
| exp2 | \(\checkmark\) | minimum | |
| log | \(\checkmark\) | fmax | |
| log10 | \(\checkmark\) | fmin | |
| log2 | \(\checkmark\) | nan_to_num | |
| log1p | \(\checkmark\) | real_if_close | |
| logaddexp | interp |
SciPy¶
Just like NumPy, writing SciPy programs for Transmuter is easy. You write your SciPy
programs as normal, and then you just need to replace import scipy
with import transpy.scipy
A sparse matrix multiplication using SciPy would look like this:
import numpy as np
from scipy.sparse import csr_matrix
a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))
a_csr = csr_matrix(a)
b_csr = csr_matrix(b)
c_csr = a_csr * b_csr
To accelerate it using Transmuter, you should change the imports:
import transpy.numpy as np # use transpy.numpy here
from transpy.scipy.sparse import csr_matrix # use transpy.scipy.sparse here
a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))
a_csr = csr_matrix(a)
b_csr = csr_matrix(b)
c_csr = a_csr * b_csr
PyTorch¶
Our PyTorch implementation accelerates the ATen library, which is used all across PyTorch, most
notably when designing neural networks. ATen is also used extensively in autograd, meaning
that backward gradient computation is supported just as forward passes in your neural networks.
import torch
N = 100
x = torch.ones(N,N,requires_grad=True)
y = x + 2
z = y * y * 3
z2 = y * x + 43
z3 = z2.pow(2)
out = z3.mean()
out.backward()
print(x.grad)
To accelerate this on Transmuter do:
import transpy.torch as torch # import transpy.torch here
N = 100
x = torch.ones(N,N,requires_grad=True)
y = x + 2 # The addition is accelerated on Transmuter
z = y * y * 3 # The multiplication is accelerated on Transmuter
z2 = y * x + 43 # Both multiplication and addition are accelerated on Transmuter
z3 = z2.pow(2) # Pow is accelerated on Transmuter
out = z3.mean() # mean is accelerated on Transmuter
out.backward() # backward() calls a dozens of ATen operations, which are all accelearted on Transmuter
print(x.grad)
Please note, our environment requires that you run pytorch with a single thread! This should happen by default, but, if you’re porting any code that multithreads pytorch, make sure the following is set, before you start executing any computational code:
torch.set_num_threads(1)
Graphs¶
SciPy provides a library of graph algorithms, called scipy.sparse.csgraph. Graph algorithms use sparse matrix or adjacency matrix representations. To use our TransPy graph algorithms, replace import scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
Graph Representation¶
This module uses graphs which are stored in a matrix format. A graph with N nodes can be represented by an (N x N) adjacency matrix G. If there is a connection from node i to node j, then G[i, j] = w, where w is the weight of the connection. For nodes i and j which are not connected, the value depends on the representation.
- Compressed Sparse Row Matrix (CSR, size = N*N)
- Compressed Sparse Column Matrix (CSC, size = N*N)
- Adjacency Matrix( adj, size = N*N)
# Graph data preparation using `transpy.numpy` and `transpy.scipy`.
# Here, to generate random int between 0 and 1, similar to adj_matrix.
>>> import transpy.numpy as np
>>> G = np.random.randint(0,2,size=(8,8))
>>> print(G)
[[0 0 1 1 1 0 1 0]
[1 1 0 0 0 0 1 1]
[1 0 0 0 1 0 1 1]
[1 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 1]
[1 0 0 0 1 0 1 0]
[0 1 1 0 1 0 0 0]
[0 0 1 1 0 1 0 1]]
>>> from transpy.scipy.sparse import csc_matrix, csr_matrix
>>> csc = csc_matrix(G)
>>> csr = csr_matrix(G)
Graph Algorithms API¶
The following table shows the list of graph functions currently supported by our TransPy package.
( For the full list of the scipy.sparse.csgraph graph functions, please follow this graphlink. )
Function Graph Input Type breadth_first_order directed graph and undirected graph breadth_first_tree directed graph and undirected graph depth_first_order directed graph and undirected graph depth_first_tree directed graph and undirected graph minimum_spanning_tree undirected graph connected_components undirected graph shortest_path directed graph and undirected graph dijkstra directed graph and undirected graph pagerank directed graph and undirected graph
breadth_first_order¶
Return a breadth-first ordering starting with specified node.
breadth_first_order(graph, i_start, return_predecessors=True)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
- i_start: int, the index of starting node
- return_predecessors: bool, optional. If True (default), then return the predecessor array
Return:
node_array: ndarray, one dimension. The breadth-first list of nodes, starting with specified node. The length of node_array is the number of nodes reachable from the specified node.
predecessors: ndarray, one dimension. Return only if return_predecessors is True. The length-N list of predecessors of each node in a breadth-first tree. If node j is in the tree, then its parent is given by predecessors[j]. If node j is not in the tree ( and for the parent node) then predecessors[j]=-9999
>>> from transpy.scipy.sparse.csgraph import breadth_first_order # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph >>> breadth_first_order(G,0) .............(TM Printing) (array([0, 2, 3, 4, 6, 7, 1, 5], dtype=int32), array([-9999, 3, 0, 0, 0, 4, 0, 2], dtype=int32))
breadth_first_tree¶
Return the tree generated by a breadth-first search. Note that a breadth-first tree from a specified node is unique.
breadth_first_tree(graph, i_start)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
i_start: int, the index of starting node
Returns:
cstree : csr matrix The N x N directed CSR of the breadth- first tree drawn from csgraph, starting at the specified node.
>>> from transpy.scipy.sparse.csgraph import breadth_first_tree # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph >>> tree_csr = breadth_first_tree(G,0) >>> tree_csr.toarray() array([[0., 0., 1., 1., 1., 0., 1., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 1.], [0., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.]])
depth_first_order¶
Return a depth-first ordering starting with specified node.
depth_first_order(graph, i_start, return_predecessors=True)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
- i_start: int, the index of starting node
- return_predecessors: bool, optional. If True (default), then return the predecessor array
Return:
node_array: ndarray, one dimension. The depth-first list of nodes, starting with specified node. The length of node_array is the number of nodes reachable from the specified node.
predecessors: ndarray, one dimension. Return only if return_predecessors is True. The length-N list of predecessors of each node in a depth-first tree. If node j is in the tree, then its parent is given by predecessors[j]. If node j is not in the tree ( and for the parent node) then predecessors[j]=-9999
>>> from transpy.scipy.sparse.csgraph import depth_first_order # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph >>> depth_first_order(G,0) .............(TM Printing) (array([0, 2, 4, 3, 1, 6, 7, 5], dtype=int32), array([-9999, 3, 0, 4, 2, 7, 1, 1], dtype=int32))
depth_first_tree¶
Return a tree generated by a depth-first search. Note that a tree generated by a depth-first search is not unique: it depends on the order that the children of each node are searched.
depth_first_tree(graph,i_start)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
i_start: int, the index of starting node
Returns:
cstree : csr matrix The N x N directed CSR of the depth-first tree drawn from csgraph, starting at the specified node.
>>> from transpy.scipy.sparse.csgraph import depth_first_tree # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph >>> tree_csr = depth_first_tree(G,0) .............(TM Printing) >>> tree_csr.toarray() array([[0., 0., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 1.], [0., 0., 0., 0., 1., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0.]])
minimum_spanning_tree¶
Return a minimum spanning tree of an undirected graph. A minimum spanning tree is a graph consisting of the subset of edges which together connect all connected nodes, while minimizing the total sum of weights on the edges. This is computed using Kruskal’s algorithm.
minimum_spanning_tree(graph, overwrite = False)- Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix . The N x N matrix representing an undirected graph over N nodes.
- overwrite: bool, optional if true, then parts of the input graph will be overwritten for efficiency.
- Returns:
- span_tree : csr matrix The N x N compressed-sparse representation of the undirected minimum spanning tree over the input.
- Notes: This routine uses undirected graphs as input and output. That is, if graph[i, j] and graph[j, i] are both zero, then nodes i and j do not have an edge connecting them. If either is nonzero, then the two are connected by the minimum nonzero value of the two.
- Parameters:
>>> from transpy.scipy.sparse.csgraph import minimum_spanning_tree # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph
>>> tree_csr = tcsg.minimum_spanning_tree(G)
.............(TM Printing)
>>> tree_csr.toarray()
array([[0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0, 0]])
connected_components¶
Analyze the connected components of a graph
connected_components(graph, return_labels=False)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
- return_labels: bool, optional. If True, then return the root labels for each of the connected components. * For directed graphs, our implementation finds strongly connected components. Nodes i and j are *strongly connected* if a path exists both from i to j and from j to i. Nodes i and j are *weakly connected* if only one of these paths exists.
Return:
n_components: int, The number of connected components
labels: ndarray, the length-N of root labels of the connected components It returns the root vertex for each connected components.
# Our implementation is using ***strongly connected*** algorithm for directed graphs now. # For undirected graph, there is no difference between strongly connected and weakly connected. >>> from transpy.scipy.sparse.csgraph import connected_components # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph >>> connected_components(G,return_labels=True) .............(TM Printing) (1, array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32))
shortest_path¶
Perform a shortest-path search on a positive graph using Dijkstra’s algorithm.
shortest_path(graph, indices=None, unweighted = False)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
- indices: int, compute the paths for the points at the given indices. * unweighted: bool, optional If True, then find unweighted distances. That is, rather than finding the path between each point such that the sum of weights is minimized, find the path such that the number of edges is minimized.
Return:
dist: ndarray, one dimension. The array of distances between graph nodes and given indices. dist[j] shortest distance from the given indices point to point j along the graph Now, if one node has infinite distance, the output is value INT_MAX(2147483647).
>>> from transpy.scipy.sparse.csgraph import shortest_path # we change scipy.sparse.csgraph by transpy.scipy.sparse.csgraph >>> shortest_path(G,indices=0) .............(TM Printing) array([0 2 1 1 1 2 1 2], dtype=int32)
dijkstra¶
Dijkstra’s algorithm.
dijkstra(graph, indices=None, unweighted=False)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
- indices: int, compute the paths for the points at the given indices. * unweighted: bool, optional If True, then find unweighted distances. That is, rather than finding the path between each point such that the sum of weights is minimized, find the path such that the number of edges is minimized.
Return:
dist: ndarray, one dimension. The array of distances between graph nodes and given indices. dist[j] shortest distance from the given indices point to point j along the graph Now, if one node has infinite distance, the output is value INT_MAX(2147483647).
>>> import transpy.numpy as np >>> G = np.random.randint(0,10,size=(8,8)) >>> G tparray([[5, 9, 1, 4, 7, 5, 4, 2], [5, 6, 9, 3, 7, 6, 6, 2], [6, 3, 6, 8, 6, 9, 2, 4], [4, 4, 0, 9, 4, 7, 0, 9], [6, 0, 9, 2, 2, 0, 1, 6], [9, 2, 6, 3, 4, 7, 2, 5], [1, 7, 1, 7, 2, 4, 9, 4], [6, 1, 4, 3, 8, 6, 5, 5]]) >>> from transpy.scipy.sparse import csc_matrix, csr_matrix >>> csr = csr_matrix(G) >>> from transpy.scipy.sparse.csgraph import dijkstra >>> dijkstra(G,indices = 0, ***unweighted = False***) .............(TM Printing) Execution time (ns): 456718 .array([0, 3, 1, 4, 5, 5, 3, 2], dtype=int32) >>> dijkstra(G,indices = 0, ***unweighted = True***) .............(TM Printing) Execution time (ns): 541917 .array([0, 1, 1, 1, 1, 1, 1, 1], dtype=int32)
pagerank¶
Return the PageRank of the nodes in the graph. It computes a ranking of the nodes in the graph G based on the structure of the incoming links.
pagerank(graph,alpha=0.85,weight=‘1’)Parameters:
- graph: “N*N”, csc matrix, csr matrix or adjacency matrix
- alpha: float, optional. Damping parameter for PageRank, default = 0.85
- weight: Edge data key to use as weight. TransPy currently only supports unweighted graphs for pagerank.
Return: ndarray, one dimension,
- pagerank values of nodes.
PageRank of TransPy can stop computing with two conditions:
if values cannot coverage: up to max_iteration 100
if values can converge: for all nodes, |P_curr - P_prev| < err, stops.
# alpha = 0,85 (default) # now, weight values are set to 1. >>> from transpy.scipy.sparse.csgraph import pagerank >>> pagerank(G,alpha=0.85,weight='1') .............(TM Printing) array( [0.11082783 0.10226965 0.13102683 0.12137377 0.18530285 0.07718898 0.14524692 0.12676316], dtype=float32)
Examples¶
Linear Algebra¶
Dense¶
Dense linear algebra should be performed using NumPy.
A matrix-matrix multiplication example was shown in the NumPy section of this document.
Other NumPy functions and operators work in the same way.
For example, an element-wise matrix multiplication is performed like this:
import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.
a = np.random.randint(0,1000,size=(8,8))
b = np.random.randint(0,1000,size=(8,8))
c = a * b
Or the exp function can be used like this to square each element in the marix:
import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.
a = np.random.randint(0,1000,size=(8,8))
c = np.exp(a,2)
Sparse¶
Sparse linear algebra should be performed using NumPy and SciPy together.
A sparse matrix vector multiplication (SpMV) example was shown in the NumPy section of this document.
Other NumPy functions and operators work in the same way.
For example, SpMV is performed like this:
import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``.
import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse``
a = sp.csr_matrix(np.random.randint(0,1000,size=(8,8)))
b = np.random.randint(0,1000,size=(8,1))
c = a.dot(b)
Graphs¶
- TransPy graph API
If you use any high-level graph algorithms, particularly if they are from any of the following libraries, please try the Transpy graph functions.
A graph workflow uses networkx
import networkx as nx
def get_BFS_tree(G, src):
# networkx bfs function
G_BFS = nx.bfs_tree(G, src)
return G_BFS
To accelerate this on Transmuter do:
import transpy.scipy.sparse.csgraph as csg
import networkx as nx
def get_BFS_tree(G, src):
# Transfer networkx graph to graph matrix
scipy_G = nx.to_numpy_matrix(G)
# TransPy graph interface
scipy_G_BFS = csg.breadth_first_tree(scipy_G,src)
# Transfer graph matrix to network graph format
G_BFS = nx.Graph(scipy_G_BFS)
return G_BFS
A graph workflow uses scipy.sparse.csgraph
import scipy.sparse.csgraph as csg
import numpy as np
def compute_row_stats(i, adj_mat_original, weighted):
# the real distances in the graph
true_dist_row = csg.dijkstra(adj_mat_original, indices=[i], unweighted=(not weighted), directed=False).squeeze()
return true_dist_row
To accelerate this on Transmuter do:
import transpy.scipy.sparse.csgraph as csg # We change ``scipy.sparse.csgraph`` to ``transpy.scipy.sparse.csgraph`` here.
import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy`` here.
def compute_row_stats(i, adj_mat_original, weighted):
# the real distances in the graph
true_dist_row = csg.dijkstra(adj_mat_original, indices=[i], unweighted=(not weighted), directed=False).squeeze()
return true_dist_row
- If TransPy cannot fit your problems with existing APIs.
You can write regular python using our TransPy standard library, including transpy.numpy, transpy.scipy and transpy.pytorch. To ensure effective performance of your program, please use bulk-data operations which capture work with whole data structures (e.g., vector add, dot). You also can use our coverage checker feedback to see how well you are doing.
For example, when we try to traverse a vertex list to update values of active vertex.
import transpy.numpy as np
update_value = (1-0.35)/0.35
r = np.ones(100)
r[0:60]=0
# Active vertex array list to check active source per iteration
active_vertex = np.zeros(100)
active_vertex[20:50]=1
p = np.full((100,),1/100)
# Transpy can update neighbors of active vertices without ``for`` loop
# using our parallelism matrix multiplication.
# It equals the sequential traverse for loop :
# for idx in range(active_vertex.shape[0]):
# if active_vertex[idx]!=0:
# p[idx]+=update_value*r[idx]
vertex_update = update_value * r * active_vertex
p += vertex_update
If traverse a graph using CSR format, we can use matrix multiplication to seepup this process.
import transpy.numpy as np
import transpy.scipy.sparse as sp
update_value = (1-0.35)/0.35
# Graph represents as CSR format
# csr.indptr stores the source indexs
# csr.indices stores the destinations of sources
csr = sp.csr_matrix(np.random.randint(0,2,size=(100,100)))
r = np.ones(100)
r[0:60]=0
# Active vertex array list to check active source per iteration
active_vertex = np.zeros(100)
active_vertex[20:50]=1
p = np.full((100,),1/100)
upt_src = update_value * r
# Transpy can update neighbors of active vertices without ``for`` loop
# using our parallelism matrix multiplication.
# It equals the sequential traverse for loop :
# for idx in range(active_vertex.shape[0]):
# if active_vertex[idx]!=0:
# ngh = csr.indices[csr.indptr[idx]:csr.indptr[idx+1]]
# for dst in ngh:
# p[dst]+=upt_src[idx]
p += csr.T.dot(upt_src*active_vertex)
The above algorithm only needs the relationships of graph, so it should be normalised to unweighted adj_matrix (only contains 0 and 1) during using matrix multiplication. If the data of matrix is required, you do not need normalise this matrix.
About all, BFS can be written by some basic operators of TransPy using spmv
import transpy.numpy as np
import transpy.scipy.sparse as sp
#define your own graph update function
def update_func(frontier,status,bfs_list,pos):
for i in range (0,frontier.shape[0]):
if (frontier[i,0]!=0 and i!=bfs_list[0,0] and status[0,i]!=1):
bfs_list[0,pos]=i
status[0,i]=1
pos=pos+1
return frontier, status, bfs_list, pos
def bfs(graph,start):
#Initial active vertex list for BFS iteration traversal
N=graph.shape[0]
frontier = np.zeros((N,1))
#Active start vertex as root vertex
frontier[start,0]=1
#Define vertex visited status array
#and active start vertex as visited
status = np.zeros((1,N))
status[0,start]=1
#Breadth first search order list
#define the start vertex as initial vertex
bfs_list = np.zeros((1,N))
bfs_list[0,0]=start
#define the index of order_list
pos = 1
# Graph algorithm main loop using spmv method
# If the source vertex i is active, frontier[i]=1
# Use transpose graph data to multiply frontier vector
# Get the active destination vertex list as new frontier
# New frontier will be the input for next iteration (active vertex list)
while (not (status==1).all()):
# Most graph algorihtms can use Matrix multiplication or SpMV to parallelism
frontier = sp.csr_matrix(graph.T) * frontier
# You can change update_func() for updating vertex by yourself, e.g., shortest_path
frontier, status, bfs_list, pos = update_func(frontier,status,bfs_list,pos)
return bfs_list
Neural Networks - Learning PyTorch With Examples¶
Please note, this series of examples has been adapted from https://pytorch.org/tutorials/beginner/pytorch_with_examples.html
Warm-up: NumPy:¶
Before introducing PyTorch, we will first implement the network using numpy.
NumPy provides an n-dimensional array object, and many functions for manipulating these arrays. NumPy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use NumPy to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using NumPy operations:
# -*- coding: utf-8 -*-
import transpy.numpy as np
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(5):
# Forward pass: compute predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)
# Compute and print loss
loss = np.square(y_pred - y).sum()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = x.T.dot(grad_h)
# Update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
PyTorch: Tensors¶
PyTorch is another great framework used for implementing modern neural networks.
Here we introduce the most fundamental PyTorch concept: the Tensor. A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Behind the scenes, Tensors can keep track of a computational graph and gradients, but they’re also useful as a generic tool for scientific computing.
Here we use PyTorch Tensors to fit a two-layer network to random data. Like the NumPy example above we need to manually implement the forward and backward passes through the network:
import transpy.torch as torch
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(5):
# Forward pass: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
PyTorch: Tensors and autograd¶
In the above examples, we had to manually implement both the forward and backward passes of our neural network. Manually implementing the backward pass is not a big deal for a small two-layer network, but can quickly get very hairy for large complex networks.
Thankfully, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality. When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.
This sounds complicated, it’s pretty simple to use in practice. Each Tensor represents a node in a computational graph. If x is a Tensor that has x.requires_grad=True then x.grad is another Tensor holding the gradient of x with respect to some scalar value.
Here we use PyTorch Tensors and autograd to implement our two-layer network; now we no longer need to manually implement the backward pass through the network:
import transpy.torch as torch
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(5):
# Forward pass: compute predicted y using operations on Tensors; these
# are exactly the same operations we used to compute the forward pass using
# Tensors, but we do not need to keep references to intermediate values since
# we are not implementing the backward pass by hand.
y_pred = x.mm(w1).clamp(min=0).mm(w2)
# Compute and print loss using operations on Tensors.
# Now loss is a Tensor of shape (1,)
# loss.item() gets the a scalar value held in the loss.
loss = (y_pred - y).pow(2).sum()
print(t, loss.item())
# Use autograd to compute the backward pass. This call will compute the
# gradient of loss with respect to all Tensors with requires_grad=True.
# After this call w1.grad and w2.grad will be Tensors holding the gradient
# of the loss with respect to w1 and w2 respectively.
loss.backward()
# Manually update weights using gradient descent. Wrap in torch.no_grad()
# because weights have requires_grad=True, but we don't need to track this
# in autograd.
# An alternative way is to operate on weight.data and weight.grad.data.
# Recall that tensor.data gives a tensor that shares the storage with
# tensor, but doesn't track history.
# You can also use torch.optim.SGD to achieve this.
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
# Manually zero the gradients after updating weights
w1.grad.zero_()
w2.grad.zero_()
PyTorch::nn¶
In PyTorch, the nn package serves this same purpose. The nn package defines a set of Modules, which are roughly equivalent to neural network layers. A Module receives input Tensors and computes output Tensors, but may also hold internal state such as Tensors containing learnable parameters. The nn package also defines a set of useful loss functions that are commonly used when training neural networks.
In this example we use the nn package to implement our two-layer network:
# -*- coding: utf-8 -*-
import transpy.torch as torch
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-4
for t in range(5):
# Forward pass: compute predicted y by passing x to the model. Module objects
# override the __call__ operator so you can call them like functions. When
# doing so you pass a Tensor of input data to the Module and it produces
# a Tensor of output data.
y_pred = model(x)
# Compute and print loss. We pass Tensors containing the predicted and true
# values of y, and the loss function returns a Tensor containing the
# loss.
loss = loss_fn(y_pred, y)
print(t, loss.item())
# Zero the gradients before running the backward pass.
model.zero_grad()
# Backward pass: compute gradient of the loss with respect to all the learnable
# parameters of the model. Internally, the parameters of each Module are stored
# in Tensors with requires_grad=True, so this call will compute gradients for
# all learnable parameters in the model.
loss.backward()
# Update the weights using gradient descent. Each parameter is a Tensor, so
# we can access its gradients like we did before.
with torch.no_grad():
for param in model.parameters():
param -= learning_rate * param.grad
PyTorch: optim¶
Up to this point we have updated the weights of our models by manually mutating the Tensors holding learnable parameters (with torch.no_grad() or .data to avoid tracking history in autograd). This is not a huge burden for simple optimization algorithms like stochastic gradient descent, but in practice we often train neural networks using more sophisticated optimizers like AdaGrad, RMSProp, Adam, etc.
The optim package in PyTorch abstracts the idea of an optimization algorithm and provides implementations of commonly used optimization algorithms.
In this example we will use the nn package to define our model as before, but we will optimize the model using the Adam algorithm provided by the optim package:
# -*- coding: utf-8 -*-
import transpy.torch as torch
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.ReLU(),
torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')
# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algorithms. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(5):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(x)
# Compute and print loss.
loss = loss_fn(y_pred, y)
print(t, loss.item())
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable
# weights of the model). This is because by default, gradients are
# accumulated in buffers( i.e, not overwritten) whenever .backward()
# is called. Checkout docs of torch.autograd.backward for more details.
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
Tips on improving coverage¶
1. Effective TransPy.
To get the best performance, wherever possible, express your algorithms using standard library operations which capture whole algorithms (e.g. BFS) operators which capture work with whole data structures (e.g. vector add, dot) and use our coverage checker feedback to see how well you are doing.
2. What to avoid.
In contrast, it is a bad idea, performance-wise, to code with your own explicit loops to traverse data structures, if a single operator can express the same computation. This will be obvious if our tool gives you surprisingly bad coverage feedback. For example, don’t write a triply nested loop for matrix multiply when you can call a pre-prepared library routine! Loops can be replaced with calls to functions like numpy’s sum or apply_along_axis, but still ensure that your code cannot be simplified further to a simpler function call.
3. Use the profiler.
The profiler will help guide your optimizations. Use it to identify hotspots in your program and to push more code into transpy calls.
Solutions to the Training Tasks¶
The following code is the solution to the training tasks.
Basic:
Task 1: Hello World.
print("Hello World!")Task 2: Vector-Add
Task 2.1:
import transpy.numpy as np a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( a + b )Task 2.2:
import transpy.numpy as np # initialize profiler for code coverage measurement import transpy.profile p = transpy.profile.Profile() # mark the start of code coverage measurement p.enable() a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( a + b ) # mark the end of code coverage measurement p.disable() # print the value of code coverage p.coverage()Beginner:
Task 3: Dot Product.
Task 3.1: The three code snippets in the following implement dot product by manual for-loop, using
numpyandtranspy.import numpy as np def mydot(a, b): c = 0 for i,j in zip(a,b): c += i*j return c a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( mydot(a,b) )import numpy as np a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( np.dot(a,b) )import transpy.numpy as np a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( np.dot(a,b) )Task 3.2: The three code snippets in the following measures code coverage for the above three versions of dot product implementation, and only using
transpyyeild positive coverage values as functions intranspyrun on accelerators, which is the only correct way to code dot product among the three versions and in general usingtranspyin coding as much as possible is recommended in our system.import numpy as np # initialize profiler for code coverage measurement import transpy.profile p = transpy.profile.Profile() def mydot(a, b): c = 0 for i,j in zip(a,b): c += i*j return c # mark the start of code coverage measurement p.enable() a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( mydot(a,b) ) # mark the end of code coverage measurement p.disable() # print the value of code coverage p.coverage()import numpy as np # initialize profiler for code coverage measurement import transpy.profile p = transpy.profile.Profile() # mark the start of code coverage measurement p.enable() a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( np.dot(a,b) ) # mark the end of code coverage measurement p.disable() # print the value of code coverage p.coverage()import transpy.numpy as np # initialize profiler for code coverage measurement import transpy.profile p = transpy.profile.Profile() # mark the start of code coverage measurement p.enable() a = np.array([1,2,3,4]) b = np.array([2,3,4,5]) print( np.dot(a,b) ) # mark the end of code coverage measurement p.disable() # print the value of code coverage p.coverage()Intermediate:
Task 4: dense M x v. User our
transpy’s built-in functions to complete the following dense matrix multiply vector.import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``. import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse`` dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 1, 1, 1], [1, 0, 1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 1]]) vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]]) # Please compute dense * vector using dot() of transpy # Solution for dense M x v dense_vector = dense.dot(vector)dense_vector: [28,22,21,11,19,19,20,22]
Task 5: sparse M x v. Use our
transpy’s built-in functions to complete the following sparse matrix multiply vector.import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``. import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse`` dense = np.array([[0, 1, 0, 1, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 1, 1, 1], [1, 0, 1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 1]]) vector = np.array([[3],[6],[5],[6],[8],[7],[8],[3]]) # Please convert dense matrix to csr_matrix using sp.csr_matrix() # Calculte sparse * vector using np.dot() # Solution for sparse M x v sparse = sp.csr_matrix(dense) sparse_vector = sparse.dot(vector)sparse_vector: [28,22,21,11,19,19,20,22]
Task 6: Graph. Use our
transpy’s built-in functions to complete the following code.# Graph algorithms using ``transpy.scipy.sparse.csgraph``, which is similar to ``scipy.sparse.csgraph``. import transpy.numpy as np # We change ``numpy`` to ``transpy.numpy``. import transpy.scipy.sparse as sp # We change ``scipy.sparse`` to ``transpy.scipy.sparse`` import transpy.scipy.sparse.csgraph as csg # We change ``scipy.sparse.csgraph`` to ``transpy.scipy.sparse.csgraph`` Graph = sp.csr_matrix(np.array([[0, 1, 0, 1, 1, 0, 1, 0], [0, 1, 0, 1, 0, 1, 0, 1], [1, 0, 0, 0, 0, 1, 1, 1], [1, 0, 1, 0, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 0, 0, 0, 0, 1, 1], [0, 1, 1, 0, 0, 0, 1, 1]])) # BFS Algorithm using csg.breadth_first_tree() # Solution for bfs bfs_tree = csg.breadth_first_tree(Graph,0) print(bfs_tree) (0, 1) 1.0 (0, 3) 1.0 (0, 4) 1.0 (0, 6) 1.0 (1, 5) 1.0 (1, 7) 1.0 (3, 2) 1.0Task 7: pytorch
import torch N = 100 x = torch.Tensor(N,N) x.requires_grad = True y = torch.Tensor(N,N) y.requires_grad = True z = x * y z1 = z + 4 out = z1.mean() out.backward() print(x.grad) print(y.grad)
Task 7.1: transpy.pytorch
import transpy.torch as torch N = 100 x = torch.Tensor(N,N) x.requires_grad = True y = torch.Tensor(N,N) y.requires_grad = True z = x * y z1 = z + 4 out = z1.mean() out.backward() print(x.grad) print(y.grad)Advanced:
Optional Task: please complete the following neural network code using our
transpy’s built-in functions. In this task, you can check the main functions whether transpy built-in functions have better code coverage.import transpy.numpy as np import transpy.profile # Initial profilling API pr = transpy.profile.Profile() # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random input and output data x = np.random.randn(N, D_in) y = np.random.randn(N, D_out) # Randomly initialize weights w1 = np.random.randn(D_in, H) w2 = np.random.randn(H, D_out) learning_rate = 1e-6 # Insert start point you want to profilling pr.enable() for t in range(5): # Forward pass: compute predicted y h = np.dot(x,w1) h_relu = np.maximum(h, 0) y_pred = np.dot(h_relu,w2) # Compute and print loss loss = np.square(y_pred - y).sum() print(t, loss) # Backprop to compute gradients of w1 and w2 with respect to loss grad_y_pred = 2.0 * (y_pred - y) grad_w2 = np.dot(h_relu.T,grad_y_pred) grad_h_relu = np.dot(grad_y_pred,w2.T) grad_h = grad_h_relu.copy() grad_h[h < 0] = 0 grad_w1 = np.dot(x.T,grad_h) # Update weights w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2 # Insert end point you want to profilling pr.disable() pr.coverage() pr.energy()
When executing the above transpy code, a higher code coverage value should be observed compared to numpy functions. The details are similar as following:
Profiling Information: ====================================================== Transmuter Coverage: 0.785 ====================================================== Device Performance Estimate: ====================================================== 96.705 GOPS/W ======================================================