What just happened? Enhancing video analysis for music therapy with the aid of machine learning
Music Therapy (MT) is the use of sounds and music within an evolving relationship between client and therapist to support and encourage physical, mental, social and emotional well-being. Video recording and analysis can capture subtle behaviours and interactions between therapist and client but is challenging and time-consuming. There is little evidence of the application of machine learning (ML) specifically to video analysis for MT but examples of the use of ML for tasks relevant to MT include person tracking, action recognition, facial analysis, object recognition and audio analysis. These considerations led to a research theme focusing on how ML might enhance video analysis in MT, with a prototype software tool serving as a means of illustration.
Using a methodology which I characterise as exploratory applied research undertaken from a positivistic viewpoint, I undertook three complementary activities to guide the software development: a literature review to identify relevant research and methods; an investigation into video analysis needs and priorities using a questionnaire and semi-structured interviews; and a set of feasibility studies with relevant software tools.
Video annotation having been identified as the most challenging aspect of video analysis, I developed a software toolkit incorporating ML to automate the annotation of 18 types of behaviour related to pose, movement, interaction, facial expression and audio events. Following testing on artificial examples, I evaluated the software under realistic conditions through two case studies: children on the autism spectrum and adults with disorders of movement. For each, I compared the information yielded by my models with the annotation scheme used by the therapist and applied selected models to video extracts. Both of the studies were viewed as credible by the therapists. I compared and contrasted my tool with related tools and have identified opportunities to build on this research.
My main finding is that ML can support video analysis for MT by (1) extracting from MT videos data for human body movement, facial expression and the location of objects of interest and (2) processing those data to detect specific types of client behaviour. This, is, to my knowledge, the first time that the feasibility of automated annotation for videos featuring a range of MT client types has been demonstrated in a single integrated tool.
History
Institution
Anglia Ruskin UniversityFile version
- Published version
Thesis name
- PhD
Thesis type
- Doctoral