Implementation and comparison of the Sobel operator on CPU and GPU using CUDA
Автор: Spiridonov K.A., Stulov I.S., Ferapontov I.A.
Журнал: Международный журнал гуманитарных и естественных наук @intjournal
Рубрика: Технические науки
Статья в выпуске: 10-5 (97), 2024 года.
Бесплатный доступ
This article examines the Sobel operator, which is used to highlight contours in images. Special attention is paid to two variants of its implementation: on the central processing unit (CPU) and on the graphics processor (GPU). The paper discusses in detail the technical aspects of the implementation of the Sobel method on the GPU, including the features of optimization and distribution of calculations on the graphics architecture. In addition, a comparative analysis of the method's performance is performed when it is performed on the CPU and GPU, which allows you to evaluate the efficiency of using the GPU for such tasks. The article also focuses on key aspects of algorithm development using the CUDA programming language, which is designed for parallel computing on GPUs.
Cpu, gpu, cuda, sobel operator
Короткий адрес: https://sciup.org/170207083
IDR: 170207083 | DOI: 10.24412/2500-1000-2024-10-5-66-69
Текст научной статьи Implementation and comparison of the Sobel operator on CPU and GPU using CUDA
One of the most important convolutions is the calculation of derivatives. Derivatives play a very important role in mathematics and physics, and the same can be said about computer vision. The images we work with consist of pixels, which, for a grayscale image, set the brightness value. That is, our picture is just a two – dimensional matrix of numbers. Therefore, the derivative in the field of working with images is the ratio of the value of the pixel increment in y to the value of the pixel increment in x.
Working with image A, we work with a function of two variables A(x,y), i.e. with a scalar field. Therefore, it is more correct to speak not about the derivative, but about the gradient of the image.
The operator calculates the brightness gradient of the image at each point. This is the direction of the greatest increase in brightness and the magnitude of its change in this direction. The result shows how "sharply" or "smoothly" the brightness of the image changes at each point, which means that the probability of finding a point on the edge, as well as the orientation of the border. In practice, calculating the magnitude of the brightness change (the probability of belonging to a face) is more reliable and easier to interpret than calculating the direction.
One such convolution is the Sobel operator. This operator is used in computer vision to highlight boundaries. To apply the Sobel operator, we use two matrices:
Ml
I-
0-11
0-2|*Л
0-1J
0 0*4
-
-2-1J
where * - convolution operation.
CPU Implementation:
void apply_sobel_operator(uint8_t *img, int width, int height, int channels, uint8_t *res, int8_t
Wx[][3], int8_t Wy[][3]) { double Gx, Gy, grad;
for (int i = 0; i < width; ++i) { for (int j = 0; j < height; ++j) {
Gx = 0; Gy = 0;
for (int u = -1; u <= 1; ++u) { for (int v = -1; v <= 1; ++v) { int ip = max(min(i + u, width-1), 0), jp = max(min(j + v, height-1), 0);
double pix = rgb_to_gray(img[(jp * width + ip) * channels], img[(jp * width + ip) * channels + 1], img[(jp * width + ip) * channels + 2]);
Gx += Wx[u+1][v+1] * pix; Gy += Wy[u+1][v+1] * pix;
}
} grad = min(255., sqrt(Gx * Gx + Gy * Gy));
res[(j * width + i) * channels] = static_cast
res[(j * width + i) * channels + 1] = static_cast
res[(j * width + i) * channels + 2] = static_cast
res[(j * width + i) * channels + 3] = img[(j * width + i) * channels + 3];
}
}
}
The implementation on the CPU does not have any particularly unique or advanced features. One area where improvements could be made is in the matrix multiplication process. By optimizing the way the image matrix is stored, we could potentially reduce the number of cache misses, thereby enhancing performance. However, achieving this would necessitate preprocessing the image, which would in turn require additional memory resources.
GPU Implementation:
__constant__ char Wx[3][3], Wy[3][3];
__global__ void apply_sobel_operator(cudaTextureObject_t img, uchar4 *res, int width, int height) { double Gx, Gy, grad, pix;
uchar4 p;
for(int y = idy; y < height; y += off_y)
for(int x = idx; x < width; x += off_x) {
Gx = 0; Gy = 0;
for (int u = -1; u <= 1; ++u) { for (int v = -1; v <= 1; ++v) { p = tex2D
pix = 0.299 * p.x + 0.587 * p.y + 0.114 * p.z;
Gx += Wx[u+1][v+1] * pix;
Gy += Wy[u+1][v+1] * pix;
}
} grad = min(255., sqrt(Gx * Gx + Gy * Gy));
res[y * width + x] = make_uchar4(grad, grad, grad, p.w);
}
}
A little bit about constant memory
Constant memory is the fastest GPU available. A distinctive feature of constant memory is the ability to write data from the host, but at the same time, only reading from this memory is possible within the
GPU, which determines its name. The __constant__ specifier is provided for storing data in constant memory.
If it is necessary to use an array in constant memory, then its size must be specified in advance, since dynamic allocation, unlike global memory, is not supported in constant memory. To write from the host to the constant memory, the cudaMemcpyToSymbol function is used, and to copy from the device to the cudaMemcpyFromSymbol host, as you can see, this approach is somewhat different from the approach when working with global memory.
To write in constant memory, use these functions:
cudaMemcpyToSymbol(Wx, host_Wx, 9);
cudaMemcpyToSymbol(Wy, host_Wy, 9);
Benchmarks and results:
Table 1. Benchmark
Configuration |
Execution time, ms |
||||
CPU |
1.902 |
48.823 |
200.103 |
853.682 |
5218.232 |
1x1, 32x1 |
0.618 |
11.989 |
65.734 |
232.912 |
1308.420 |
1x1, 32x32 |
0.179 |
3.102 |
15.083 |
49.431 |
299.033 |
32x32, 32x8 |
0.111 |
0.732 |
2.682 |
10.001 |
58.932 |
32x32, 32x32 |
0.157 |
0.973 |
3.992 |
12.783 |
59.562 |
64x64, 32x8 |
0.204 |
1.291 |
4.712 |
11.421 |
60.058 |
64x64, 32x32 |
0.361 |
1.401 |
4.302 |
16.103 |
62.842 |
Size of test |
100x100 |
500x500 |
1000x1000 |
2000x2000 |
5000x5000 |
Results:

Figure 1. Original picture

Figure 2. The Sobel operator applied to that image
Список литературы Implementation and comparison of the Sobel operator on CPU and GPU using CUDA
- Гонсалес Р., Вудс Р. Цифровая обработка изображений. - 3-е изд. - Москва: Техносфера, 2012. - 1104 с. EDN: SDTUTF
- Кормен Т.Х., Лейзерсон Ч.Э., Ривест Р.Л., Штайн К. Алгоритмы: построение и анализ. - 3-е изд. - Москва: Вильямс, 2013. - 1328 с.
- Сандерс Дж., Кэндрот Э. Технология CUDA в примерах. - Москва: ДМК Пресс, 2011. - 312 с.
- Страуструп, Б. Программирование: принципы и практика с использованием C++. - 2-е изд. - М.: Addison-Wesley, 2014. - 1312 с.
- Керниган Б., Ритчи Д. Язык программирования С. - 2-е изд. - М.: Мир, 1989. - 272 с.