GitHub - MentalHealthMission/data-book

This book provides a pipeline for data analysis, data cleaning, and feature extraction that can be applied to a range of smartphone and wearable datasets. It is built from a GitHub repository, which contains all the code for this pipeline and is designed to be converted into a Jupyter Book once the pipeline is complete, with each Jupyter notebook becoming a chapter of the book that records the data analysis results, code used, and decisions made for one specific type of data.

The repository includes a general template which gives a step-by-step method for processing the raw data, ranging from data analysis to feature extraction. The data analysis includes three main steps that are helpful for all data types, as well as additional steps that will be useful for certain types of data. The cleaning and feature extraction stage includes a function to create minutely, hourly or daily features from the raw data and also to save a cleaned version of the raw data. For all data types, the features produced include metadata features that describe the quality/quantity of the data for that interval. These can be useful either as a direct input to a machine learning model trained on the data, or to determine whether or not each interval should be classified as missing data during subsequent data processing.

In addition to the general template, three other templates are provided for specific types of data: step count, sleep, and heart rate. These help to illustrate how the functions provided can be applied (and sometimes adjusted) to different data types, and suggest additional analyses that may be useful for each data type. We have tried to make these templates as general as possible within these data types, but further tailoring may be required for specific datasets.

Each of the templates is given as a Jupyter notebook which is stored under the content folder. These notebooks call functions that are defined in the python scripts (.py files in the src folder). The other files in the repository are all required for constructing of a Jupyter Book.

💡 Tip: If you want to publish the Jupyter Book locally, you can use the command jupyter-book start and open the book in your local browser. For more information, including on how to generate static html pages or pdfs, please check their website: https://jupyterbook.org/

The general template gives details on the data analysis, data cleaning, and feature extraction functions used, so should be read first. The data type specific templates can then be used as guides to build pipelines for those specific types of data. Although specific functions are currently given for only three different data types, the general template can be tailored to a wide range of data types, possibly with some additional specialized analyses needed in some cases.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
content		content
example_data/test		example_data/test
images		images
src		src
.gitignore		.gitignore
README.md		README.md
myst.yml		myst.yml
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages