Learning Visual Groupings and Representations With Minimal Human Labels