top of page

Combating False Positives In Gesture Recognition

We crave effortless command of the digital world around us, however FPs ruin the experience. Here is our approach to the problem.



Mudra Band for the Apple Watch


Controlling digital devices with finger movements is a holy grail. We all want to be able to command the growing digital world around us effortlessly and intuitively. One of the difficulties with such technology is recognizing unintentional movements as gestures. We may wave our hand or scratch a surface and such movement will be (incorrectly) recognized as a gesture.


With the Mudra Band we aim to reach this holy grail. The band captures neural signals sent from your brain, through the wrist, to your fingers. Our patented SNC sensors capture the signals, while our deep learning AI algorithms decipher the signal pattern and classify which finger is being moved. We partner this tech with the Apple Watch, for a futuristic and premium experience.


Our approach to the FP problem

We introduce a practical “multi-layered approach” instead of a “holistic” approach. Each layers peels off a part of the problem and the next layer deals with the errors that the previous layer did not. I’ll describe these “layers” from the simplest to the most complex:


  1. Recognize positions of the hand which are irrelevant to gesture recognition. We use an IMU (Inertial Measurement Unit) to achieve this. An IMU contains two primary sensing components, a gyroscope and accelerometer. We use the gyro to determine the (approximate) orientation of the hand and the accelerometer to measure the acceleration relative to the ground (gravity vector g).


Hand orientation in which gesture recognition is not relevant


2. Applications provide specific parameters for the deep learning model, on the fly (no action is required from the user).


The first example is answering or dismissing a call. Such an application requires robustness (we do not want to unintentionally answer a call). This means that we do not need to recognize all possible gestures, only a specific tap gesture, with a high confidence metric. An analogy for this is an autonomous car. We do not want to pull the brakes without making sure we’re going to collide.



Answering an incoming call


The second example is playing a game. This time, the application requires sensitivity. Our hand is positioned accordingly and we require effortless control. A low confidence is required and FP robustness is achieved with the posture of the user. Reusing the analogy of the autonomous car, we would like to automatically turn on the headlights if the user forgot to. If the classifier is wrong, no harm is done. If its right then we can potentially prevent an accident.



Playing snake


The third example is using a “wake up” gesture to make sure the application only activates after a certain gesture. Such a gesture is “unique” in the sense that a user does not activate it during regular usage.



“‘Wake-up” gesture in action

.


3. Achieving robustness with deep learning. This approach requires a number of steps.


Index Gesture



Thumb Gesture SNC signal pattern


First, train on many more gestures than you need. Gestures which are irrelevant are added to a “noise class”, in addition to actual noise.


Second, perform model personalization. This means that during the calibration procedure, we can tune a specific individual’s model to a specific user physiology and movements. Note that this is not the same as fine tuning a model, since data from the calibration procedure is very limited.


Third, Use multitask learning to achieve state-of-the-art performance. This means that we can reuse ideas from deep learning object detection frameworks to recognize bio-potential patterns within a stream. This is closely related to recognizing objects within a movie stream.

...


To summarize, detection of false positives is a difficult problem. Deep Learning does not inherently solve this problem, due to the “brittleness” of neural nets. Furthermore, all methods have limitations regarding tough-to-classify observations. Our approach introduces domain knowledge and practical expertise and on our internal databases we have minimized the error dramatically.


Original blog post was published on Medium





bottom of page