Begining of the project : May, 21 2007
At the beginning of the first week , we didn't know what libraries / tools to use and we had big gaps in our knowledge of 3D geometry and stereo reconstruction. We read the very well made course of Peter Sturm on Computer Vision, and we got acknowledged with Visual C++ and OpenCV. Starting from nothing, except examples from the internet, one of the hard part was to organize our software and his development. We began to create the software backbone : classes and methods.
We thought we could make our self calibration tool, using the OpenCV methods (there is a method that detects corners on a chessboard, which can be very useful for calibration). But this is a long process and we wanted to spend most of your time on the Motion Capture itself, so we decided to use MatLab to calibrate our cams. That's why we built a function able to import the result from a MatLab file into our software.
During several days we conducted experiments and tests on the way we could control the video stream of our webcams, from our laptops. We tried to code a simple soft which could open the configuration pannel of the webcams, and take shots. We used that to take shots of a calibration pattern which was a chessboard.
On the other side, we began to ponder about image processing algorithms which would be able to detect white spots on a low quality picture. Algorithms and methods are detailed on the image processing page.
Originally we didn't know if we would be able to build a real-time software, that's why we tried to assess the bandwidth of the USB ports, and the capacity of our laptops. The results were disapointing : to get a real-time stream we had to plug one cam per USB controller, that is to say two cams maximum per laptop. What's more, we had a poor framerate using the OpenCV capture methodes, and given that the image processing algorithms require a lot of time to compute a picture, we concluded that real-time capture was impossible on a single computer.
Luckily we found another way to capture the video stream : it's called CvCam and it's part of OpenCV. We also tried to spread the load over several computers.
Thus, we came to network. With a cam per client laptop -the eyes-, and a server -the brain-, we were able to get all the pictures we needed within 33 ms ( the max cam frequence was 30 FPS). The clients, after being connected to the server, wait for a capture request, then take a shot, find the white spots, and send the coordinates of these spots to the server. Network also allows the synchronisation of the captures. More details on the network page.
After a setup of the network, we conducted some tests on its speed and its bandwidth capacities. Given that only few data transit on it, we were finally ready for the real-time capture.
The first thing that we needed now was the possibilty to test what we had done. We added the OpenGL rendering thread, server side, to visualize the scene (only the cameras at the moment).
We tried differents methods to process the images, we took shots of black suited models, wearing white spots. And we tested our algorithms. Actually the best way to process the images was to set the parameters at the launching of the soft : we gave up the fully automatic configuration : the webcams are differents, the luminosity changes ...
Due to OpenCV -CvCam- bugs, we separated the soft into two parts : calibration part, and capture part. Why? because we need a first thread to run the network engine, and another to get the video stream, and yet the CvCam methods that we invoke to start the capture creates its own thread, and we cannot put it in another thread. The solution we found : we start the network in the main thread, then we start the capture, that creates its own thread but we cannot control it (stop it) anymore without exiting the program. Because we need a pause between calibration and capture, we decided to split it in two parts.
Another main issue was the initialisation of the scene. We need to track the white spots and we need to identify then, because the server receive only non identified 2D coordinates, and therefore needs a startpoint to track each spot. Our idea was to introduce an initialisation part, in which the model spreads his arms, and stay still. Then we click on the spots, in a predefined order, on each clients. We have set a little tracking system to keep the spots identified by the client. We the server send the capture start request, every client sends the ordered data, the initialisation sequence is over.
We began the 3D reconstruction by coding the methods able to reproject a 3D points on a 2D cam picture, and to reconstruct a 3D points from several (minimum 2) shots from differents cams. We implemented a set of mathematical tools to work on matrices and vectors.
A lot of tests and debugging are done to get a working program. We recalculate the calibration data : MatLab returns actually coordinates of the chessboard in the camera system; we thought that was the camera coordinates in the chessboard system. After a day of brain-breaking reflexions about coordinate system changes, we manage to have a very well position and orientation of the cameras.
We tried our system without tracking with only 1 ball and it seems to work fine, the moves of the model in black are well represented in the openGL scene. We had to deal with a difference between OpenGL and OpenCV : they take the pixel 0,0 in the top right corner for OpenCV so the Y-axis is descending. but in openGL it is ascending. Now everything is allright !
We work now on the 3D tracking, using epipolar lines, in order to use our system with several balls.
We have done several versions of the tracking system : the first using the epipolar constraint to evaluate wether 2 rays intersect, and the other use the actual distances between the rays.
We have conducted some experiments, using 2, 3 and 4 cams, with 1 to 5 balls. That worked fine with 2 cams, but this is not efficient if we have to deal with occlusions. With more than 2 cams, the number of "ghost" spots increase dramatically, because there are many rays which intersect even if there is no spot, so the motion is not smooth : the tracking system sees a lot of balls and has to choose some of them to match the spots in the former position. Thus this seems to be unstable, and we try now to find a way to improve that.
We have a whole system who works fine with 2 cams, that is to say : the network, the calibration, the image processing, the 3D reconstruction, the tracking and the visualization.
- What problem we have encoutered :
Weird behavior of VC++ with the network in debug mode : the accept() method doesn't work in debug mode and we had some problems with that in release, depending on the nature (local or global) of some variables, for instance our Scene which sometimes spawned a crash of the network ...
The Keyboard management that we tried to include in the select() method, using the STDIN stream, actually doesn't work on windows.
The USB bandwidth is two weak to use several cams at the same time.
The HighGui capture methods provide a too low fps so we needed to use CvCam.
We still need to use MatLab for the calibration, we have chosen to focus on the motion capture itself, but there are some tools in openCV to do your own calibration in a further development.
We had also a problem with the matrices returned by MatLab because we misunderstood their meaning.
3D Reconstruction & Tracking :
We have some problems when we use 3 cams or more, due to the unstability of the geometry that we use, as explained previously.
Image Processing :
The equalization of the pictures histogram didn't work well enough so we switched to a manual configuration.
We hardly managed to control the CvCam capture thread .. The exit of the program is quite "dirty" due to that.
During the filming :
We had some light issues because of the "cheap" aspect of the project, we had to use the sunlight because the light system of the black room where we worked was coming from lthe ceiling and that spawned unconvenient shadows.
During this month we had to build a whole system from scratch, including all the aspects that we had to deal with, and that you can view details on this website. We also built our own algorithms for every phase of the process. We wanted to do a cheap mocap system, therefore unstable when using a lot of camera, due to the sum of the errors and imperfection of our cameras which are cheap webcams.
Nevertheless the system is quite sastisfying with a few cams, and considering that we had only 1 month to do all the work needed.