Many cost functionals are expressed in the form of summations of a certain number of terms.Examples are the:

**Sphere function**

**Rosenbrock function**

**Styblinski-Tang function**

In all those cases, the evaluation of the cost function can be performed by a reduction, or better, a transformation followed by a reduction.

Typically, large number of parameters to be optimized are managed by *local optimization*.

In this case, CUDA Thrust has thrust::transform_reduce which can surely serve the scope.

On our GitHub website , we are providing an example on how the Rosenbrock functional can be computed using either CUDA Thrust or a customized version of the reduction routine offered by the CUDA examples.

In the latter case, a pointer to a __device__ transformation function is passed to the customized transform_reduce function, if the EXTERNAL keyword is defined, or the transformation function is defined and compiled in the compilation unit of the customized transform_reduce routine.

Some performance results on a Kepler K20c card for the non-EXTERNAL case:

N |
Thrust |
Customized |

90000 | 0.055ms | 0.059ms |

900000 | 0.67ms | 0.14ms |

9000000 | 0.85ms | 0.87ms |

On the other side, a reduced number of unknowns is typically dealt with *global optimization*. In this case, reduction is shared memory could be an interesting option. Fortunately, CUB offers primitives for reduction in shared memory.

On our GitHub website we report a worked example on how using CUB for the calculation of a large number of cost functional values for problems having a moderate number of unknowns.

The cost functional in this case is chosen to be the Rastrigin function, but the example can be adapted to other cost functionals by just changing the corresponding __device__ function.