STUDYING DEPRESSION USING LINGUISTIC FEATURES FROM MULTIPLE SOCIAL MEDIA SOURCES

Depression is one of the major mental health problems of the world. Many cases of depression remain undetected due to the restrictive nature of clinical studies and the personal or societal stigma associated with this ailment. The objective of this thesis was to harness the potential of self-declared data on social media (specifically Twitter) and neuroticism data (N7) from a generalized personality test (MyPersonality Test) to build a model from large-scale weakly labeled data sources to predict depression scores as measured by the clinically validated screening tool: Centre for Epidemiological Studies Depression Scale (CES-D). The results show that weakly labeled data, present in abundance, helps improve machine learning models for diagnosing depression. Report