Home:ALL Converter>Optimizing a nested for-loop using CUDA

Optimizing a nested for-loop using CUDA

Ask Time:2012-09-23T08:28:52         Author:user1161310

Json Formatter

So I have a project I'm working on that uses OpenCV to detect motion in moving objects. I'm trying to speed up the detection and have a nested for-loop that I want to speed up using CUDA. I have CUDA integration all set up in Visual Basic. Here is the nested for-loop in my .cpp file.

      for (int i=0; i<NumberOfFeatures; i++)
    // Compute integral image.
    cvIntegral(mFeatureImgs[i], mFirstOrderIIs[i]);

    for (int j=0; j<NumberOfFeatures; j++)
      // Compute product feature image.
      cvMul(mFeatureImgs[i], mFeatureImgs[j], mWorker);

      // Compute integral image.
      cvIntegral(mWorker, mSecondOrderIIs[i][j]);

I'm relatively new to CUDA, so my question is, could someone show me an example of how exactly I would make this nested for-loop go faster using CUDA?

Author:user1161310,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/12548778/optimizing-a-nested-for-loop-using-cuda
Robert Crovella :

As sgar91 pointed out, OpenCV includes a GPU module as described here:\n\nhttp://opencv.willowgarage.com/wiki/OpenCV_GPU\n\nThat wiki also suggests how to ask GPU related questions on the OpenCV help forum on Yahoo.\n\nThere is a gpu-accelerated image integral function. If you look around you may find an equivalent for cvMul as well.\n\nyou can't use the exact same datatypes in the non-GPU code and the GPU version. Take a look at the \"short sample\" example given on the wiki page I posted previously. You will see you need to do something like this to transfer your existing data to data structures that can be operated on by the GPU:\n\n cv::gpu::GpuMat dst, src; // this is defining variables that can be accessed by the GPU\n src.upload(src_host); // this is loading the src (GPU variable) with the image data\n\n cv::gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY); //this is causing the GPU to act\n\n\nyou will need to do someting similar, such as:\n\n cv::gpu::GpuMat dst, src;\n src.upload(src_data);\n\n cv::gpu::integral(src, dst);\n",