Research2026-06-24

Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers

arXiv:2606.24047v1 Announce Type: new Abstract: One of the significant mental health issues affecting female sex workers (FSWs) is mental disorders, especially depression. Exposure to violence, stigma, and economic hardship further increases their psychological risk. Current machine learning (ML)...

What Happened

A new preprint on arXiv (2606.24047v1) proposes a machine learning framework specifically designed to predict depression risk among female sex workers (FSWs). The approach combines ensemble feature selection with Harris Hawks Optimization (HHO), a nature-inspired metaheuristic algorithm, to build an explainable predictive model. The research targets a highly vulnerable population where mental health disorders—particularly depression—are exacerbated by systemic factors including violence, stigma, and economic instability. By integrating feature selection with optimization, the authors aim to improve both prediction accuracy and model interpretability, a critical requirement for clinical or social service deployment.

Why It Matters

This work addresses a persistent tension in applied machine learning: the trade-off between model performance and explainability. In high-stakes domains like mental health, black-box models are often unacceptable. Clinicians, social workers, and policymakers need to understand why a model flags an individual as high-risk. The use of Harris Hawks Optimization—a relatively recent swarm intelligence algorithm inspired by the cooperative hunting behavior of Harris’s hawks—is notable. It suggests a growing trend of adapting bio-inspired algorithms for feature selection in specialized, resource-constrained contexts.

More importantly, this research highlights an underserved population. Female sex workers face unique, intersecting risk factors that generic mental health models may fail to capture. By tailoring the feature engineering process to this group, the study demonstrates how domain-specific ML can uncover patterns that broad population models miss. For AI practitioners, this is a reminder that off-the-shelf models trained on general datasets can perpetuate blind spots when applied to marginalized communities.

Implications for AI Practitioners

Explainability is not optional for deployment. The explicit focus on explainable AI (XAI) in this paper reinforces that regulatory and ethical standards are increasingly demanding transparency. Practitioners should anticipate that any model affecting human welfare—especially in healthcare or social services—will require interpretable outputs, not just high AUC scores. Feature selection remains a bottleneck. Ensemble methods for feature selection, combined with optimization algorithms like HHO, can reduce dimensionality while preserving predictive power. This is especially valuable when working with high-dimensional survey or behavioral data where irrelevant features can degrade performance and obscure causal signals. Domain adaptation is a differentiator. Generic mental health risk models may not generalize to specific populations. The study’s focus on FSWs underscores the value of customizing feature sets to reflect population-specific stressors. AI teams working on social impact projects should invest in domain expertise or collaborate with subject-matter experts during the feature engineering phase. Bio-inspired optimization is gaining traction. While algorithms like genetic algorithms and particle swarm optimization are well-established, HHO represents a newer alternative. Practitioners should monitor the literature for comparative studies that benchmark these algorithms against traditional methods in real-world, low-data environments.

Key Takeaways

Ensemble feature selection combined with Harris Hawks Optimization offers a pathway to both high accuracy and explainability in mental health risk prediction.
Domain-specific modeling for vulnerable populations can uncover risk factors that generic models miss, improving clinical relevance.
AI practitioners must prioritize interpretability when building models for high-stakes human welfare applications, not just raw performance metrics.
Bio-inspired optimization algorithms like HHO are emerging as viable tools for feature selection in specialized, data-constrained settings.

Read Original Article on Arxiv CS.AI

arxivpapers