Real-Time Tracking -mean-shift

来源:百度文库 编辑:神马文学网 时间:2024/04/29 21:26:52
 

Real-Time Tracking

Mean-Shift




Introduction:  The subject of real-time tracking is of particular intereset to the vision community and has practical applications from surveilance to sporting events to more experimental applications such as the notion of ‘smart rooms‘ and increasingly meaningful human-computer interaction.  Mean-Shift tracking is a real-time algorithm that endeavors to maximize the correlation between two statistical distributions.  The correlation, or similarity between two distributions is expressed as a measurement derived from the Bhattacharyya coefficient.  Statistical distributions can be built using any characteristic discriminating to a particular object of interest.  A general model might use color, or texture or include both.  A very simple form of the image gradient, i.e., pixel intensity differences between nearest neighbors in both the x and y directions will be used as the random variables in the tracking distributions implemented here.
 
 

Part I

Scale Edge Hierarchy




The efficacy of this algorithm will rely on how representative the choice of random variables discriminates an object.  Since only derivative information will be used, this particular Mean-Shift implementation will be in the business of matching structure, in particular, edge-like structure.  Edges are neat little discontinuities that maintain over different scales (particularly, silhouette edges).  This property can be utilized to increase the discriminating power of this edge-based matching algorithm.  Below is a pyramid showing images and their x and y derivative-maps at three different scales, for each of the two sequences.  Note that silhouette edges are preserved, and some interior reflectance edges are lost.  However, this multiscale edge information will not be used in this experiment.  The loss of the ability to use this distinct silhouette information will have profound effects on the algorithm.  Most notably, the tracker doesn‘t concern itself with this distinct ‘outline‘ and a window that encompasses Robby performs worse than a smaller interior window that is concerned only with maintaining that interior distribution (near homogenous, as it turns out).
 


Sequence One: Pyramid



 
 
 


 

 
 

Sequence Two: Pyramid


The two pyramids above show a scale heirarchy and the corresponding ‘difference-maps‘.  For each particular scale  the difference in the x-direction is on the left and the difference in the y-direction is on the right.  One rather frightening thing about these ‘difference-maps‘ is that Robby appears less salient than in the greyscale image.  This is purely a subjective statement of percept, and hopefully the actual numbers involved are more discriminating than they ‘appear‘.
 


Part II
Learning the Distributions





The method to build the statistical distribution is given in the Comaniciu, Ramesh, and Meer paper.  The model is nothing more than a weighted histogram of quantized difference values in the x and y directions.  The weighting is given by some kernel function that fits the needs of the designer.  The Epanechnikov kernel is used for this implementation.  The paper seemed to suggest that the window dimensions were the same as h.  However, in the case of this implemention, h is half of the window dimensions.  This actually inforces the distribution window to be elliptical rather than rectangular.  Figure 1 and Figure 2 show the two rectangular regions that will be shown during tracking paired with the actual shape of the distribution that survives the Epanechnikov kernel with a non-zero value.
 
 
 

Figure 1



 

Figure 2



Depicted below is a series of figures depicting the structure distributions for various frames of Robby.  Included also are several non-Robby examples.  It is instructive to see exactly what the algorthm is up against by noting the difference between Robby and non-Robby distributions.  Each plot is separated into four separate pieces of information.  The first simply illustrates the region of the image that the distribution is derived from (pun semi-intentional), the second is the weighted histogram, with the quantized difference in the x and y directions as the index.  Note, I scaled the difference information between 0 and 255/(quantization value), so a zero difference between two pixels will accumulate in the middle of the distribution.  The difference information does form stalagmites around zero-difference.   The third and fourth plots within each figure show the marginal probability distributions of the x-difference and y-difference respectively.
 
 


Quantization Value Eight:


Figure 3


Figure 4


Figure 5


Figure 6


Figure 7
 


Figure 8
 


Figure 9
 


Figure 10
 


Figure 11
 


Figure 12
 


Figure 13
 
 


Figure 14
 


Figure 15
 


Figure 16
 


Figure 17
 
 

Quantization Value 4:


Figure 18
 


Figure 19
 


Figure 20
 


Figure 21
 


Figure 22
 

Quantization Value 16:


Figure 23
 


Figure 24
 


Figure 25
 


Figure 26
 


Figure 27




A quantization value of eight gives 32 bins in either direction and a Bhattacharyya coefficent based on a correlation of two 1024 element distributions.  The above figures show representative distributions from both Robby and non-Robby regions using quantization values of 4, 8 and 16.  My trials were run using a quantization value of 8.  The choice was based mainly on a the fact that it seemed a happy medium between Q16 which clearly lacks information and Q4  which I feared would be a bit too picky.  All tests were run using Q8.

It  becomes increasingly worrisome from the plots that perhaps edge information is not the most useful piece of information in regards to tracking Robby.  In particular, Figure 6 shows the distribution of the grill of the planet cruiser.  The horizontal bars give a very distinct texture as the texture repeats and signficant and meaningful tails appear in the distribution.   In this case, edge-information is useful.   However, the distributions corresponding to Robby regions are very symmetric, and peak near zero-difference with little or no tail.   Curiously, a lot of distributions around Robby have similar characteristics.

Part III

Tracking





Below are a series of tracking sequences using the algorithm developed in the above mentioned paper.  The first image shows where the box was selected for the target model distribution.  The table then proceeds through the sequence.  Tracking results are depicted at roughly every 10th frame.
 


Sequence One



 
 

 
 
 
 

Sequence Two



 
 

Not Terrible.  However, it runs into some difficulties.  The most notable is that Robby doesn‘t so much translate, but rather rotates and scales.  By about the 65th frame in sequence one, Robby has turned 90 degrees and the distribution is completely different.  The second sequence doesn‘t suffer as much from this problem as Robby doesn‘t rotate very much but he scales a lot.  A possible solution to this problem is to update the target model distribution occasionally to account for new information that might arise from dis/occlusion or rotation.  Updating every N frames seems a bit arbitrary when there might be a more data driven approach.  Since the objective of the Mean Shift algorithm is to maximize the Bhattacharyya coefficient, this is a meaningful measurement of how close our target and candidate distributions actually are to one another.  I modified the algorithm so that if the Bhattacharyya coefficient is below a certain similarity threshold then the target distribution is updated once the current frame converges.  After watching the Bhattarcharyya coefficient through the first two sequences I chose an empirical similarity threshold of .9.  Unity is the coefficient for identical distributions, and zero indicates no correlation.  This relatively high threshold guards against updating the distribution on a ‘bad‘ track.  I tracked both sequences again using this modified algorithm.  The results are shown below.
 
 

Sequence One: With Updated Distributions



 
 

 
 

Sequence Two: With Updated Distributions


 
 

By watching each frame go by, I think this is a big improvement over the static target distribution.  However, the results objectively look only moderately better.  Two problems immediately present themselves.  First, how to best penalize a region that wants to wander off of the image.  In these sequences I don‘t penalize it at all which accounts for its eventual desire to wander off to the Forbidden Planet‘s Underworld.  The next problem has to do with the fact that Robby scales significantly during the course of the sequence.  Using a static window forces one to either initialize a ‘baggy‘ window at the beginning and have it tight at the end (which doesn‘t work because ‘baggy‘ windows are inclined to drift), or begin with a tight window and have a rather ineffectual representation of where Robby actually is in the final sequences.  The problem with the small window on the big Robby is that there are multiple positions within Robby that serve the same Bhattacharyya coefficient and the window tends to wander around in the interior of Robby and only signifies the notion that part of Robby is within the window.  A possible solution to this problem is to enable a variable window, i.e.,  dynamically modify h.  As suggested in the paper, for the next set of series, I calculate three Bhattacharyya coefficents:  the first with the current window, the second with a window reduced by 10%, and a third with a window inflated by 10%.  The window that wins the day is that with the highest coefficient, and h and the window are updated accordingly.  This allows the window to take on any scale at increments of 10% per frame.  The first series below, which corresponds to sequence two, uses this variable window but does NOT update the distribution (i.e., set the similarity threshold to zero).  The second and third series use both the variable window and the distribution update (with a similarity threshold of .9).  The dynamic window could also be implemented by letting hx and hy vary independantly.  However, the combinatorics reduce the algorithm to less than real-time in this case.
 
 

Sequence Two: With Variable Window



 
 

 

Sequence One: With Variable Window and Updated Distributions


 

Sequence One: With Variable Window and Updated Distributions

The results for the first series (sequence two) are encouraging.  The window scales itself in a meaningful fashion and the tracking results are much better.  The second and third series however have trouble.  Around the 65th frame, Robby has almost completed his turn and the tracking consistently wants to wander off the backside of Robby.   Looking at Figure 22 and Figure 26 its not suprising as the distribution in that region looks very Robby-like.  In the first several runs the tracking managed to avoid this because the window region was so long that the grill of the cruiser prevented it from drifting in that direction.  My bag of tricks is empty.  I tried a number of things in order to try and coax it to track this series using a variable window and a distribution update because I think that both are the correct way to do things, but I couldn‘t get satisfactory results using both modifications with the first sequence.
 


Conclusions

It turns out that the algorithm is simple and reasonably effective.  One can imagine that color information is probably more discriminating than difference information, and I believe this to be the major drawback in this implementation.  Robby‘s difference information isn‘t distinctive enough to enable perfect tracking.  Many regions contain difference maps similar to Robby.  Another drawback is that spatial information is lost, allowing very different ‘looking‘ regions to appear similar to the tracking algorithm.
 

Code:

In order of importance:
trackRobby()
trackRobby2()
MeanShift()
bhattarcharyya()
learnDistribution()
defineModel()
epanech()
reduceGrad()