Speech-linguistic Multimodal Representation for Depression Severity Assessment

Rodrigues Makiuchi Mariana; Warnita Tifani; Uto Kuniaki; Shinoda Koichi

Publication Information

Title

Japanese:
English:	Speech-linguistic Multimodal Representation for Depression Severity Assessment

Author

Japanese:	R Makiuchi Mariana, Warnita Tifani, 宇都有昭, 篠田浩一.
English:	Rodrigues Makiuchi Mariana, Warnita Tifani, Uto Kuniaki, Shinoda Koichi.

Language

English

Journal/Book name

Japanese:	情報処理学会研究報告. SLP, 音声言語情報処理
English:	IPSJ SIG Technical Report

Volume, Number, Page

Vol. 2019-SLP-130 No. 8 pp. 1-4

Published date

Dec. 2019

Publisher

Japanese:	情報処理学会
English:	Information Processing Society of Japan

Conference name

Japanese:	第130回SLP研究発表会
English:

Conference site

Japanese:	東京都
English:	Tokyo

File

Official URL

https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=200787&item_no=1&page_id=13&block_id=8

Abstract

Depression is a common, but serious mental disorder that affects people all over the world. Therefore, an automatic depression assessment system is demanded to make the diagnosis of this disorder more popular. We propose a multimodal fusion of speech and linguistic representations for depression detection. We train our model to infer the Patient Health Questionnaire (PHQ) score of subjects from the E-DAIC corpus. For the speech modality, we apply VGG-16 extracted features to a Gated Convolutional Neural Network (GCNN) and a LSTM layer. For the linguistic representation, we extract BERT features from transcriptions and input them to a CNN and a LSTM layer. We evaluated the feature fusion model with the Concordance Correlation Coefficient (CCC), achieving a score of 0.696 on the development set and 0.403 on the testing set. The inclusion of visual features is also discussed. The results of this work were submitted to the Audio/Visual Emotion Challenge and Workshop (AVEC 2019).

Home

Search

Support

About T2R2

Related Links

Publication Information