We had in this part of the project two goals. We needed to find which patches represent the same spot in the differents images, to construct the good spots. And we needed to follow the spots in time to follow the motion of the actor. Track the spot in time means to keep the identities of the spots.
This two goals could be processed one after the other or vice versa.
The first idea is to follow the patches in the images of the different cameras, and then, after having found the identities of the patches, rebuilding in 3D each spot. To know the identities of the patches, we take the best matching between the patches of the previous image and the patches of the current image, and this for each camera. To deal with the occlusions of a spot for a camera, the previous image is in fact the projection of the 3D scene on the camera. The “best” matching is defined by the lowest sum of the distances between two matched patches.
The problem of this technique is that after the image processing, some patches that are not spots could stay, and so could easily be identified as a spot. Morover a spot could be lost because an evil patch has taken its place during the search of the “best” matching.
This choice seems a little unstable, so we chose the second solution.
Each patch on a image represents a ray or a straight line in 3D geometry. The spots are where two straight lines (or more) are cutting themselves. So first we match the patches (or the rays) when they are close enough (they never actually cut themselves).
The distance between two ray is partly given by the epipolar geometry (See Peter Sturm Course). If the epipolar constraint is verified, two rays represent the same spot. That corresponds to a determinant between the two guiding vectors and the vector represented by two points : one on each line that we randomly choose. To have an evaluation of the distance, we need to divide by the norm of the vectorial product of the two guiding vectors.
After this process, we can have more spot that can exist in the scene, so we run a little process to remove some of them.
Thus we know what patches represent a spot, so we can build the 3D scene.
After that, we match this scene with the previous 3D scene. If the first tracking gives too many spots, this matching gives only the initial number of spots. The “best” spots are kept. If the first tracking has lost some spots, the old spots are kept.
This system seems more reliable because we use the epipolar geometry, used in professional motion capture systems.