To accomplish this, the robot is usually equipped with 2 cameras that take pictures of the environment they are in. These 2 cameras commonly hold a distance from each other, a distance similar to the one we present with our eyes, due to it, the pictures from one camera are slightly shifted with respect to the ones taken by the other camera. This shift is usually denominated disparity and is what the computer uses to know whether an object is close by or far away.
One major problem that the machine encounters while carrying out this task, is how does it detect that the red cup in picture 1 is moved , let's say, 4 cm with respect to where the red cup is in picture 2?
These vision tasks require finding corresponding features across 2 or more views.
Therefore the first necessary step is to find the features of a scene. But how do we do this?
What we can start doing is making image patches.Elements to be matched are image patches of a fixed size..The task is therefore to find the best (most similar) patch in the second image, it is clear that the chosen patch should be very distinctive (there should only be one patch in the second picture that looks similar). One good patch is one that presents large variation in the neighborhood of a point in all directions.
For example Take the following 2 images:
A good patch image patch could be:
while a bad one, because it has many matchings is:
We are looking for stable features over changes of view points. One type of features that maintain this type of characteristic are Corners.
The Harris Corner detector provides a mathematical tool for finding them.
With an image patch, we can have the following cases:
a) The patch represents a 'flat' zone.
b)The patch represents an edge .
c) The patch represents a corner .
A Flat region as we can see from the above image, presents no change in all directions, an edge presents no change along the edge direction, and a corner presents significant change in all directions. This means that if we shift the window of where we are gathering the patch image, we should perceive a large change in appearance.
The Harris Corner Detector gives us a way to determine which of the above cases hold.
But how does it do it exactly???!
The Harris Corner Detector utilizes the following expression:
E (u,v)=∑ W(Xi,Yi)[I1(Xi+U,Yi+V)-I0(Xi,Yi)]^2
W(Xi,Yi) is a window function. Which sets:
I0(Xi,Yi) is the intensity that is present in the pixel located in (Xi,Yi) and I1(Xi+U,Yi+V) is the intensity located in the pixel Xi+U,Yi+V it is called the shifted or displaced Intensity.
It is easy to see that to detect corners, we want points where E(u,v) is very large.
Using Taylor's first order approximation and matrix algebra the above expression can be rewritten as:
Where M is a 2x2 matrix computed from image derivatives.
The classification can be carried out by analyzing the eigenvalues of the M matrix.
The measure of the corner response is actually set by:
R=(determinant of M)+k(trace of M)^2
Determinant of M=λ1λ2
Trace of M=λ1+λ2
K is an empirical constant that varies from .04-.06
For corners R tends to be very large.
For edges, R tends to be a very large negative number.
For flat areas R tends to zero.
2 comments:
wuórales!
que bueno que lo explicaste con manzanitas porque luego la wikipedia complica mucho con tanta ecuación.
thanks for the explanation :)
update:
luego me fuí a google y encontré más información ilustrativa sobre computer vision con demitos explanatorios.
y yo que decía que el picasa hacía magia con eso del face recognition caray.
ahahahhaha
pero bueno, que ingeniosos :)
gracias por compartir, ahora me doy una idea de como sucede la magia, jojo.
saluditos :P
oye esta suuper chido ese link que me pasaste! Gracias =D!
Quiero pasarlo a espa~ol...Gracias Sammy por visitar y leer mi blog =D
Post a Comment