Torch And Linear Regression On Gpu
September 10, 2015There are few packages which can help with implementation of RNNs and their need for high performance calculations. I like caffe the most but it can be chalenging especially when it comes to adding new code as you need to deal with C++ and Cuda. There is also Theano, but I am not a great fun with heavy computational tree optimisation especially during evaluation stage. There is also Torch based on Lua which is ... well I don't know what Torch can do at this stage ...
Hence this post will be about implementing linear regression using Cuda Tensors and Torch 7. The example below is loosely based on Torch 7 and iTorch demos.
require 'cutorch';
require 'cunn';
require 'optim';
torch.setdefaulttensortype( 'torch.FloatTensor' )
logger = optim.Logger( paths.concat('.', 'train.log') )
For this exercise we will use fairly large table
x_len = 1000000
x_width = 2
X = torch.CudaTensor( x_len, x_width ):normal()
A = torch.CudaTensor{ {1}, {2} }
Y = torch.mm( X, A ) + torch.CudaTensor( x_len, 1 ):normal( 3.0, 1.0 )
Let's define linear layer to express our regression. NN package will take care of gradient derivation as well as forward and backward passes
lin_layer = nn.Linear( (#X)[2], (#Y)[2] )
model = nn.Sequential()
model:add( lin_layer )
model:cuda()
criterion = nn.MSECriterion()
criterion:cuda()
params, dl_dparams = model:getParameters()
sgd_params = {
learningRate = 1e-3,
learningRateDecay = 1e-4,
weightDecay = 0,
momentum = 0
}
epochs = 100
batch_size = 50000
function train( X, Y )
local current_loss = 0
-- mini input / target
local inputs = torch.CudaTensor( batch_size, x_width )
local targets = torch.CudaTensor( batch_size )
-- we won't use shuffle over here as for loop is too slow in lua
-- instead we will start from a random offset
local offset = math.floor( torch.uniform( 0, batch_size-1 ) )
-- for each mini batch
for t = 1,(#X)[1], batch_size do
local x_start = t + offset
local x_end = math.min( t + offset + batch_size - 1, (#X)[1] )
inputs[ ] = X[ ]:clone()
targets[ ] = Y[ ]:clone()
-- eval function to minimise
feval = function( params_new )
-- clean up
collectgarbage()
if params ~= params_new then
params:copy( params_new )
end
-- reset gradients (gradients are always accumulated, to accomodate batch methods)
dl_dparams:zero()
-- evaluate the loss function and its derivative wrt x
local outputs = model:forward( inputs )
local loss = criterion:forward( outputs, targets )
local backprop = criterion:backward( outputs, targets )
model:backward( inputs, backprop )
-- return loss and dloss/dparams
return loss, dl_dparams
end
-- run SGD
_, fs = optim.sgd( feval, params, sgd_params )
current_loss = current_loss + fs[1]
end
current_loss = current_loss / batch_size
logger:add{['training_error'] = current_loss }
return current_loss
end
time = sys.clock()
local cumm_loss = 0.
for i = 1, epochs do
cumm_loss = train( X, Y )
end
print( 'Final loss = ' .. cumm_loss )
-- time taken
time = sys.clock() - time
print( "Time per epoch = " .. (time / epochs) .. '[s]')
Let's take a look at recovered parameters. They should be close to matrix A + mean of noise ( 3 ):
1
2
3
print( params )
Not bad. Here's the chart of MSE as a function of epoch
Plot = require 'itorch.Plot'
for name, list in pairs( logger.symbols ) do
y = torch.Tensor( list )
x = torch.linspace( 1, #list, #list )
plot = Plot():line( x, y ,'blue', name ):legend(true):title('MSE'):draw()
end
plot:redraw()