# SVD of a real matrix in CUDA

The calculation of the Singular Value Decomposition (SVD) of a matrix is at the basis of many computations and approaches in applied science. One example is the regularized solution of linear systems of equations. Another is Principal Component Analysis.

Many times, the applications requiring the SVD calculation deal with large matrices and/or request the SVD computation in an iterative process.

Fortunately, the SVD can be quickly computed in CUDA using the routines provided in the cuSOLVER library. Below, we provide a representative example relevant in common situations.

Before proceeding further, two points to remember:

• gesvd routines assume Nrows >= Ncols. Of course, the case Nrows < Ncols can be dealt with by matrix transposition, for example by cublas<t>geam();
• column major ordering is assumed.

The full example is contained in the SVD.cu file.

The first step towards SVD calculation in CUDA is to initialize the cuSOLVER, as is required for any other routine of the cuSOLVER library:

```cusolverDnHandle_t solver_handle;cusolverDnCreate(&solver_handle);
```

The second step is to allocate the buffer space required by the SVD calculation routine:

```cusolveSafeCall(cusolverDnDgesvd_bufferSize(solver_handle, Nrows, Ncols, &work_size));double *work;  gpuErrchk(cudaMalloc(&work, work_size * sizeof(double)));
```

The buffer space dimension is calculated by cusolverDnDgesvd_bufferSize.
The third step is the final SVD calculation:

```cusolveSafeCall(cusolverDnDgesvd(solver_handle, 'A', 'A', Nrows, Ncols, d_A, Nrows, d_S, d_U, Nrows, d_V, Ncols, work, work_size, NULL, devInfo));int devInfo_h = 0;

gpuErrchk(cudaMemcpy(&devInfo_h, devInfo, sizeof(int), cudaMemcpyDeviceToHost));

if (devInfo_h != 0) std::cout << "Unsuccessful SVD execution\n\n";
```

Note that cusolverDnDgesvd returns the matrix V transposed. Also, U is the matrix of the left singular vectors, namely, the singular vectors spanning the output space of the matrix, while V is the matrix of the right singular vectors, namely, the singular vectors spanning the input space of the matrix.

For the example at SVD.cu, the output is:

```Singular values
d_S = 314.891390715956
d_S = 14.4118456776669
d_S = 0.842607258487678
d_S = 0.0277423461350813
d_S = 0.000710344049837318

U[0,0]=-0.026900
U[1,0]=-0.043393
U[2,0]=-0.091636
U[3,0]=-0.181098
U[4,0]=-0.319440
U[5,0]=-0.513289
U[6,0]=-0.768565

U[0,1]=-0.366199
U[1,1]=-0.428964
U[2,1]=-0.477524
U[3,1]=-0.454763
U[4,1]=-0.323752
U[5,1]=-0.055840
U[6,1]=0.372983

U[0,2]=0.747999
U[1,2]=0.316640
U[2,2]=-0.072851
U[3,2]=-0.312084
U[4,2]=-0.353727
U[5,2]=-0.162134
U[6,2]=0.293467

U[0,3]=-0.530903
U[1,3]=0.565260
U[2,3]=0.392143
U[3,3]=-0.039990
U[4,3]=-0.324668
U[5,3]=-0.264268
U[6,3]=0.260770

U[0,4]=0.152553
U[1,4]=-0.590073
U[2,4]=0.467756
U[3,4]=0.350750
U[4,4]=-0.204309
U[5,4]=-0.423010
U[6,4]=0.256984

U[0,5]=0.007732
U[1,5]=-0.054514
U[2,5]=0.051584
U[3,5]=0.281387
U[4,5]=-0.717133
U[5,5]=0.607745
U[6,5]=-0.177468

U[0,6]=-0.021961
U[1,6]=0.207773
U[2,6]=-0.618898
U[3,6]=0.677639
U[4,6]=-0.081169
U[5,6]=-0.298327
U[6,6]=0.136131

V[0,0]=-0.349562
V[0,1]=-0.395802
V[0,2]=-0.442097
V[0,3]=-0.488660
V[0,4]=-0.535638

V[1,0]=0.637619
V[1,1]=0.405149
V[1,2]=0.114078
V[1,3]=-0.224424
V[1,4]=-0.604909

V[2,0]=0.635177
V[2,1]=-0.323686
V[2,2]=-0.505308
V[2,3]=-0.213766
V[2,4]=0.436743

V[3,0]=-0.258432
V[3,1]=0.705292
V[3,2]=-0.279236
V[3,3]=-0.497617
V[3,4]=0.331934

V[4,0]=0.031788
V[4,1]=-0.277461
V[4,2]=0.676925
V[4,3]=-0.646162
V[4,4]=0.215062
```

These results can be compared with the following simple Matlab code:

```clear
allclose
allclc
Nrows = 7;
Ncols = 5;
A = zeros(Nrows, Ncols);

for j = 0 : Nrows - 1
for i = 0 : Ncols - 1
A(j + 1, i + 1) = (i + j * j) * sqrt(i + j);
end
end

[U, S, V] = svd(A);
```

which outputs:

```314.8914
14.4118
0.8426
0.0277
0.0007

U =
-0.0269 0.3662 -0.7480 0.5309 0.1526 0.0076 -0.0220
-0.0434 0.4290 -0.3166 -0.5653 -0.5901 -0.0528 0.2082
-0.0916 0.4775 0.0729 -0.3921 0.4678 0.0466 -0.6193
-0.1811 0.4548 0.3121 0.0400 0.3508 0.2868 0.6754
-0.3194 0.3238 0.3537 0.3247 -0.2043 -0.7178 -0.0754
-0.5133 0.0558 0.1621 0.2643 -0.4230 0.6053 -0.3032
-0.7686 -0.3730 -0.2935 -0.2608 0.2570 -0.1764 0.1375

V =
-0.3496 -0.6376 -0.6352 0.2584 0.0318
-0.3958 -0.4051 0.3237 -0.7053 -0.2775
-0.4421 -0.1141 0.5053 0.2792 0.6769
-0.4887 0.2244 0.2138 0.4976 -0.6462
-0.5356 0.6049 -0.4367 -0.3319 0.2151
```

As it can be seen, the CUDA and Matlab results coincide for the singular values, but not for the singular vectors. However, the singular vectors span the same spaces.

## 3 thoughts on “SVD of a real matrix in CUDA”

1. Vahid says:

When I run this code, the performance’s not as good as expected. After profiling it with nvprof, I found out that there’re so many HtoD and DtoH data transfer (with chuck size of only few bytes), thus killing the performance. Do you have any insight into the reason for this behavior? I used CUDA 8 on a K40m

1. OrangeOwl says:

Hi, the SVD routine from cuSOLVER is known not to have good performance, especially as compared to MKL. This is recognized by Joe Eaton, manager of cuSOLVER library in this post: https://devtalk.nvidia.com/default/topic/830894/cusolverdncgesvd-performance-vs-mkl/. If you need better performance, you have two possibilities: Using CULA or using MAGMA. The former library is not for free, while the latter does. We have recently used MAGMA’s SVD routines and experienced very good performance against MKL.

2. Abraham Nieto says:

which sentences did U use for compiling an executing this code???

i don’t know how