5995-hd-needle-thread-wallpaperFrom long time ago threads are needed in programming to overlap operations which don't depend between them. Nowadays, with the multi-cores is something basic for every application if you want to achieve a good performance.  This introduce some problems like accessing shared variables at the same time by several threads or synchronize tasks on the code. I will try to cover how to deal with these topics on C++11 analyzing the executions with helgrind, a Valgrind tool. 

Read more: Threads in C++11. Producer-consumer

lu 03LU factorization is a popular method to decompose a matrix as the product of a Lower triangle and an Upper triangle matrices. Lot of simulations perform the solving of a given system of linear equations. In general, the resolution of these systems is quite hard so matrix decomposition method is frequently used to subdivide the problem. As it's easier to solve a triangular matrix than a standard matrix, this is usually the first step on many simulations.

Read more: Visual LU factorization

 02Sometimes matrix multiply operations are hard to see and understand. With this small application done in C++ in Qt I try to show how a matrix multiplication of range equal three is done. 

Read more: Visual Matrix Multiply


opencl logoOpenCL can be benefited by the hardware resources that some architecture have. The overlap between computations and transfers explained on the previous article is also possible to achieved through this language. This means the management of three different flows: host to device transfers, kernel computations and device to host transfers at the same time. In this article I show how we can increase up to 3,5x the kernel execution time (for this example) without modify any line of the kernel code in OpenCL.

Read more: OpenCL. Overlapping computation and transfers.