Deep learning for sensor-based human activity recognition

Wisdom D'Almeida

Published in

Becoming Human: Artificial Intelligence Magazine

8 min readJan 5, 2018

A detailed analysis of my deep learning approach to HAR.

The source code of this work is available on my GitHub repository below

wisdal/Deep-Learning-for-Sensor-based-Human-Activity-Recognition

Deep-Learning-for-Sensor-based-Human-Activity-Recognition - Application of Deep Learning to Human Activity Recognition…

github.com

UPDATE: currently revamping my source code to adapt it to the latest TensorFlow releases; things have changed a lot since version 1.1. I will update the above repository once the new code is ready. The solution’s logic remains same.

Very soon, I will also upload the android app that I developed to test the model using smartphone sensors data.

With advances in Machine Intelligence in recent years, our smartwatches and smartphones can now use apps empowered with Artificial Intelligence to predict human activity, based on raw accelerometer and gyroscope sensor signals. This problem is commonly referred to as Sensor-based Human Activity Recognition (HAR). Its applications range from healthcare to security (gait analysis for human identification, for instance).

Limitations of existing approaches

Decision tree, Support Vector Machine, Naive Bayes, Hidden Markov Models and K-Nearest Neighbors are the most used algorithms to tackle HAR problem. Models implementing these algorithms:

Require hand-crafted feature generation: Ax, Ay, Az, Gx, Gy, Gz are time-series and they generally serve as input parameters to HAR algorithms. For a classic application, algorithms mentioned above are irreproachable. But classifying a time series (a window of accelerometer and gyroscope readings) requires another approach because the classification is rather performed on a sample window. To achieve this goal using classical approaches, one has first to generate time-domain and sometimes frequency-domain statistics for each training window (50 Hz sample rate for instance). At the end, the final model has to predict on an OVERVIEW of the sample window instead of on each record individually. Which feature to generate then? This is just the problem! There is no magic formula and this is where experience pays. It should not be the case though!
Do not generalize: the accuracy of common algorithms tends to drop when activities are performed by people not included in the training phase. This is a big problem for an application that has TO SCALE. Also, these algorithms are usually confused when we change the way we wear the sensor devices on our body (possible locations: hand, wrist, pocket, etc.) or the orientation of the device once worn (the sensors axes change direction as well).
Use Online APIs: in existing approaches, trained models are deployed as web APIs that final products (mobile apps for instance) call periodically for real-time HAR. Even if there is nothing wrong with this approach, we believe on-device prediction deserves more attention. And what the existing implementations cannot provide is precisely the ability to deploy or embed trained models in mobile apps.
Are not optimized for low resource consumption: most of the existing mobile apps for HAR are developed without resource optimization in mind. And computing time and frequency domain statistics for each sample (at a given rate) before serving a model for prediction is one of the reasons why HAR is still a NOT COMPLETELY SOLVED PROBLEM.

We believe we can take advantage of existing improvements in Artificial Intelligence to solve most of the problems mentioned above, with Deep Learning.

How we approach the problem

The training dataset

The choice of dataset heavily depends on your application. Generally, activities we are interested to recognize are Sitting, Standing, Walking, Running, Climbing Stairs Up, Climbing Stairs Down, etc. but I have applied HAR once to predict different yoga positions (Bosch Hackathon 2017, Finalist). So, as I said before, it only depends on your application.

I have applied my approach to a dataset collected by Allan et al. (1,6 GB). This dataset contains readings from two sensors (accelerometer and gyroscope). Readings were recorded when users executed activities in no specific order, while carrying smartwatches and smartphones. The readings are from 9 users performing 6 activities (Sitting, Standing, Walking, Biking, ClimbStair-Up and ClimbStair-Down) using 6 types of mobile devices.

Some common issues we face when applying HAR to a custom task are the imbalance of the dataset and the lack of enough training data. In the first case, I usually apply SMOTE oversampling technique and naturally adapt it as a data augmentation solution for the second case.

Our model architecture

We propose a model combining a CNN (Convolutional Neural Network) and a RNN (Recurrent Neural Network). Input sensor measurements are split into series of data intervals along time. The representation of each data interval is then fed into a CNN to learn intra-interval local interactions within each sensing modality and intra-interval global interactions among different sensor inputs, hierarchically. The intra-interval representations along time are then finally fed into a RNN to learn the inter-interval relationships.

The CNN automatically extracts local features within each sensor modality and merges the local features of different sensor modalities into global features hierarchically. This beats the classical hand-crafted feature generation used in existing approaches I mentioned previously.

The RNN extracts temporal dependencies. To understand how this might be useful, I tested existing implementations by performing an activity in various positions (changing the configuration of my body). My goal was to confuse the model and get accelerometer and gyroscope values it was not used to, for that specific activity. The prediction accuracy dropped as expected and the explanation is quite simple: the model has learned to predict on the basis of raw values instead of considering how these values vary with time. But this is exactly how I think it should work because different performances of the same activity, independently of who performed it or the position used, produce approximately the same variation of Ax, Ay, Az, Gx, Gy, Gz over time. So why not take this into consideration in the training phase? This is exactly why our model features a RNN.

Because of the consistency of our architecture, our final model generalizes pretty well. The rest is up to fine-tuning.

Training the model

Because building a deep learning model from scratch requires high performing computers and GPUs, it is preferable to build the model on a cloud platform. I used Google Cloud Platform for this purpose. Also, I designed the algorithm using Google TensorFlow with Python 3.4.

On-device prediction

It is crucial today to be able to deploy machine learning models to mobile platforms. In our case, HAR practical solutions are increasingly integrated into mobile apps. Fortunately, Google offers libraries to use TensorFlow on Android - this allows us to use models built with TensorFlow in android apps. All we need to do is “Freeze” the model meta-graph and export it as a file (“.pb” format). Using Google inference library for Android, we can now feed the exported model with real-time sensors data.

from tensorflow.python.tools import freeze_graph

MODEL_NAME = 'har'

input_graph_path = 'checkpoint/' + MODEL_NAME+'.pbtxt'
checkpoint_path = './checkpoint/' +MODEL_NAME+'.ckpt'
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
output_frozen_graph_name = 'frozen_'+MODEL_NAME+'.pb'

freeze_graph.freeze_graph(input_graph_path, input_saver="",
                          input_binary=False, input_checkpoint=checkpoint_path, 
                          output_node_names="y_", restore_op_name="save/restore_all",
                          filename_tensor_name="save/Const:0", 
                          output_graph=output_frozen_graph_name, clear_devices=True, initializer_nodes="")

More concretely:

Freeze the model and export it as a file (model.pb for instance)
Add Google inference library for Android in the Android app
Create an Activity in the Android App that feeds the model each 50 Hz (Sample rate) with sensors real-time accelerometer and gyroscope data
Return the predicted activity or the activities with their prediction confidence (probability)

public float[] predictProbabilities(float[] data) {
    float[] result = new float[OUTPUT_SIZE];
    inferenceInterface.feed(INPUT_NODE, data, INPUT_SIZE);
    inferenceInterface.run(OUTPUT_NODES);
    inferenceInterface.fetch(OUTPUT_NODE, result);

    //Downstairs    Jogging      Sitting    Standing    Upstairs    Walking
    return result;
}

We see that we do not need to compute extra features before serving our model for prediction. We can therefore assume that the resource consumption of our technology is limited only to collecting sensors data periodically and predicting by feeding our model graph.

Results

We were able to obtain 98% of training accuracy and 92% of testing accuracy using a customized cross-validation technique involving leaving one user out during training and testing on him.

Challenges (TO DO)

I am considering:

Transfer Learning: research on how to apply existing models on custom activities without having to train a new model from scratch.
Incremental learning: this is a kind of dynamic learning where input data is continuously used to extend the existing model’s knowledge without bearing the cost of retraining the entire model from scratch. It could be very helpful in HAR to be able to add, in production, training data of a new user and see it be reflected directly in the model knowledge.
On-device learning: I think TensorFlow Lite and TensorFlow Mobile already constitute major breakthroughs.

I hope reading this was helpful to you. Thanks for your time and feel free to share. I will also be happy to know what you think of this approach.

Resources: