In the CUDA_mex_host_to_device GitHub directory, we provide an example on how creating a mex function executing on the GPU when the input real data reside on the host and the final results are returned on the host.
The first thing to do is to recover the pointer to the first element of the real data from the Matlab input array/matrix:
double *h_input = mxGetPr(prhs);
We can also recover the number of elements of the input variable (the input variable can be also a matrix) as:
int numElements = mxGetN(prhs) * mxGetM(prhs);
After that, we have to move the host data to the device and perform the computations on the device.
Finally, we have to create the real output Matlab array/matrix
plhs = mxCreateDoubleMatrix(1, numElements, mxREAL);
recover the pointer to the first element of the real ouput array/matrix and move the results from the GPU to the CPU.
Once compiled, the mex functionl can be used in the following way:
a = 1 : 10; b = CUDA_mex_host_to_device_real(a);
where a is an input array residing on the host and, similarly, b is a result array residing on the host.