This study introduces a novel approach to unsupervised skeleton-based human action recognition by integrating generative and contrastive learning methods. We propose a decomposition of representations, allowing for the preservation of detailed motion information for the generative learning objective while also extracting action features for the contrastive learning objective. By swapping contrastive representations between positive pairs (coining the name SwapCLR), we ensure that the generative and contrastive representations are complementary and both objectives contribute to learning a strong representation for downstream tasks like action recognition. Additionally, we address the challenge of noisy data in skeleton-based action recognition with a new saturating reconstruction loss, significantly reducing the impact of noise common to key-point detections. Our method demonstrates state-of-the-art performance in unsupervised action recognition on the NTU and PKU-MMD datasets, while also enabling generative downstream tasks such as motion in-painting and motion generation. Overall, these experimental results confirm the method’s effectiveness and suggest its applicability to a variety of action analysis tasks.