Recently, several deep learning methods have been applied to decoding in task-related fMRI, and their advantages have been exploited in a variety of ways. However, this paradigm is sometimes problematic, due to the difficulty of applying deep learning to high-dimensional data and small sample size conditions. The difficulties in gathering a large amount of data to develop predictive machine learning models with multiple layers from fMRI experiments with complicated designs and tasks are well-recognized. Group-level, multi-voxel pattern analysis with small sample sizes results in low statistical power and large accuracy evaluation errors; failure in such instances is ascribed to the individual variability that risks information leakage, a particular issue when dealing with a limited number of subjects. In this study, using a small-size fMRI dataset evaluating bilingual language switch in a property generation task, we evaluated the relative fit of different deep learning models, incorporating moderate split methods to control the amount of information leakage. Our results indicated that using the session shuffle split as the data folding method, along with the multichannel 2D convolutional neural network (M2DCNN) classifier, recorded the best authentic classification accuracy, which outperformed the efficiency of 3D convolutional neural network (3DCNN). In this manuscript, we discuss the tolerability of within-subject or within-session information leakage, of which the impact is generally considered small but complex and essentially unknown; this requires clarification in future studies.