Invasive brain–machine interfaces (BMIs) are a promising neurotechnological venture for achieving direct speech communication from a human brain, but it faces many challenges. In this paper, we measured the invasive electrocorticogram (ECoG) signals from seven participating epilepsy patients as they spoke a sentence consisting of multiple phrases. A Transformer encoder was incorporated into a "sequence-to-sequence" model to decode spoken sentences from the ECoG. The decoding test revealed that the use of the Transformer model achieved a minimum phrase error rate (PER) of 16.4%, and the median (±standard deviation) across seven participants was 31.3% (±10.0%). Moreover, the proposed model with the Transformer achieved significantly better decoding accuracy than a conventional long short-term memory model.