In this study, we utilized publicly available web-based, real-world voice data to develop an AI model capable of distinguishing between individuals who died by suicide and carefully matched controls. The most effective model, a feedforward neural network (Multilayer Perceptron), demonstrated high predictive accuracy, especially considering its reliance on incidental real-world data. Notably, this accuracy improved significantly when analyzing the subset of NearGroup data, which includes individuals who died by suicide within 12 months of the audio recording. Specifically, the model achieved higher AUC and accuracy for this subset, underscoring the critical role of temporal proximity in the identification of voice biomarkers linked to suicide risk.
These findings substantiate earlier work suggesting that paralinguistic features are promising biomarkers for mental health conditions, including suicidality. For instance, prior studies by Pestian et al.9,10 and Hashim et al.11,12 demonstrated that acoustic and linguistic features could differentiate individuals with suicidal ideation from controls with high accuracy. However, those studies relied on clinical or structured settings and predicted suicidal ideation or questionnaire outcomes, not actual suicide.
Our work builds on and extends these efforts by demonstrating that AI models can predict completed suicides using publicly available, naturalistic voice data—a significant advancement over studies like those by Amiriparian et al.13 and Song et al.14which, while highly accurate, relied on emergency or hotline data and still focused on risk indicators rather than confirmed outcomes. By leveraging data from the general population and focusing on verified suicide deaths, our model aligns more closely with real-world application potential, improving ecological validity as highlighted by Iyer et al.17 and Belouali et al.18.
Moreover, our findings echo observations by Walsh et al.34who found that suicide risk prediction improves with temporal proximity to the event. We show a similar trend: predictive accuracy increased significantly in cases where the suicide occurred within 12 months of the recording, suggesting that the acoustic markers of suicide risk may become more pronounced as the event nears.
Furthermore, the model achieved strong results not only with the feedforward neural network but also across multiple classification algorithms, including Logistic Regression, Nearest Neighbors, XGBoost Linear, and XGBoost Tree with various paralinguistic features. The robustness of the study is further evidenced by the model’s ability to maintain high performance despite perturbations in the analytical pipeline. These findings suggest the potential for significant advancements in suicide risk assessment through the use of ML models—specifically neural networks—and voice data.
All in all, despite the inherent complexity and limitations of the data, characterized by noise, lack of background clinical information, which could shed light on the potential causes that led to completed suicide, as well as the difficulty of the task at hand, the model exhibits remarkable robustness. It is important, however, to note the absence of clinical information detailing the specific psychopathology experienced by the participants, in our study, both by those who died by suicide and those in the control group. Despite the necessity to make assumptions about potential co-morbidities and external influences such as medications and drugs of abuse, the dataset remains invaluable for its unequivocal ‘hard outcome’. This inherent shortcoming can, however, be seen as a strength of our investigation, i.e. we were able to discern distinct groups even amid lack of background clinical information.
Importantly, this is the first study to successfully predict actual suicidal behavior rather than relying on surrogate markers, such as self-reported measures from questionnaires. This marks a significant advancement in suicide prevention research, as it demonstrates the feasibility of using AI to analyze naturalistic voice data for the identification of suicide risk. These findings hold promise for improving early detection efforts and tailoring interventions to prevent suicide, particularly in the critical period leading up to the event.
Future research
Our future research efforts will focus on a rigorous validation of the results by acquiring data in a clinical study featuring well-diagnosed patients. Furthermore, the exploration of additional acoustic feature groups, such as tempo, rhythm, spectral features, and nuanced analysis of pauses (both filled and unfilled), presents an avenue to further increase the model’s performance. The consideration of larger datasets based on automated analysis, allowing for the formation of a holdout set, improving upon the current cross-validation approach for a more robust evaluation of the results. Moreover, the incorporation of more extensive datasets could facilitate the application of alternative, end-to-end AI approaches, including transformer models. Additionally, textual information (use of certain word classes or a particular ratio of word classes) is needed to make the analysis more comprehensive. Furthermore, differentiating between various psychopathologies that contribute to depression, suicidal ideation, and death by suicide within the general population will provide deeper understanding of the complexities surrounding mental health. One way to achieve these goals would be to integrate voice biomarker analysis into national suicide prevention hotlines.
Ethical considerations
The integration of AI and ML in health data analysis poses numerous ethical issues that should be carefully considered. It is imperative to safeguard individual privacy while weighing against the potential benefits to well-being. As highlighted by Lejeune et al.35 ‘the application of AI to health data will require robust cyber security as well as a clear legal framework’ (p.7). This group of researchers further emphasizes that while AI holds great promise, there is a need for caution, as it could introduce a responsibility problem, i.e. overreliance on technology in healthcare decision-making. They argue that such reliance may lead to diminished human oversight and accountability in clinical settings. Consequently, AI should be regarded as a tool to complement, rather than replace, human clinical judgment. Ensuring that healthcare professionals remain integral to the decision-making process is essential to maintain the quality and ethical standards of patient care. Furthermore, while this study highlights the potential of repurposing publicly available data for addressing critical health challenges and gaining valuable insights (data collection procedure which has been validated by an ethics board), we strongly agree that the development and deployment of AI systems in healthcare must be guided by foundational ethical principles, including transparency, fairness, and the right to privacy, to build trust and ensure the responsible use of technology.