Traditional machine learning solutions to the problem of invariant pattern recognition (visual or other sensory) involve some form of supervision as the input representations are not suitable for simple unsupervised clustering -- two objects in the same place have a more similar input representation than the same object in two different places. But what is the origin of the neural supervisory signal in the brain? In my Ph.D. studies in Computer Science, I explored the idea that input from other sensory modalities can provide a useful neural teaching signal. As non-visual signals, received by other sensory modalities, are sensitive to different environmental manipulations, they can be useful to help the visual modality discover visually invariant features for recognition. I developed a novel ``self-supervised'' learning algorithm that uses input from other sensory modalities in an efficient and physiologically plausible way while achieving performance comparable to that of externally supervised algorithms.