This paper describes a system submitted to the VoxCelebSpeaker Recognition Challenge 2020 (VoxSRC2020). Previ-ously, Stafylakis et al. proposed a self-supervised learningmethod for x-vector-based speaker recognition. This methodtries to reconstruct the features of each input utterance fromits phoneme recognition result and the speaker features for thewhole utterance, but speaker features for each phoneme maybe different. In this paper, we propose a speaker recogni-tion method using phoneme-dependent speaker features. Thismethod concatenates the phone group label obtained fromphoneme recognition to the input features of each frame. Thenit generates frame-by-frame speaker features and use them forreconstruction. In the evaluation experiment, it achieved EERof 3.25%.