This paper reports on a novel invasive brain–computer interface (BCI) paradigm that has successfully reconstructed spoken sentences from invasive electrocorticogram (ECoG) signals using deep-neural-network-based encoders and a pre-trained neural vocoder. We recorded ECoG signals while 13 participants were speaking short sentences. Our BCI could map the ECoG recording to the log-mel spectrograms of the spoken sentences using a bidirectional long short-term memory (BLSTM) or a Transformer. The estimated log-mel spectrograms were used in Parallel WaveGAN to synthesize speech waveforms. An evaluation of the model performance revealed that the Transformer model significantly outperformed (Wilcoxon signed-rank test, p < 0.001) the BLSTM in terms of mean square error loss and Pearson correlation.