The analysis of bi-temporal images by a collaborative robot control system to determine new objects in the field of view of the technical vision subunit
- Konstantin A. Kalushev, National Research Nuclear University “MEPhI” (Moscow, Russia)
- Ilya A. Makarov, National Research Nuclear University “MEPhI” (Moscow, Russia)
One of the tasks associated with developing an interactive collaborative robotic manipulator is temporal analysis of the work scene, i.e., determining the order in which objects appear (and disappear) within the field of view of the vision system. Traditionally, this issue has been considered in the context of satellite imagery and has not been sufficiently addressed in the literature with regard to scenes located approximately 1 m from the camera. At the same time, work scene analysis based on bi-temporal images is a relevant area of research in the context of the development of robotics in general and physical artificial intelligence in particular. Creating high-quality temporal change masks of the work scene makes it possible to determine the contours and geometric centers of new objects for subsequent grasping by the robotic manipulator. A high-quality temporal mask should not contain falsely detected change regions (change objects that do not actually exist), yet should clearly outline the contours of genuine change objects in the work scene. The paper presents a mathematical formulation of the temporal analysis problem and, on its basis, proposes a method for generating temporal change region masks by differentiating “before” and “after” images, combining classical computer vision techniques with the neural network segmentation model SAM (Segment Anything Model). The novelty of the proposed approach lies in applying to the difference image not algebraic processing, but rather its segmentation into two regions (a change region and a no-change region) using a neural network segmentation model. The proposed approach was compared with algebraic methods for creating temporal masks (Change Vector Analysis – CVA and Slow Feature Analysis – SFA) and with the use of a multilayer perceptron neural network architecture (input layer of 12 neurons, hidden layer of 512 neurons, output layer of 1 neuron). It is demonstrated that the proposed approach enables the generation of high-quality change masks for diverse objects against a large number of backgrounds (including cluttered ones), a result that is difficult to achieve with the methods brought for comparison. At the same time, the proposed approach can be implemented “on the fly,” i.e., in real time during robot operator work, only if a Graphics Processing Unit (GPU) is available.
collaborative robot, bitemporal images, SAM, binary change masks
2026-06-05