Towards Training-Free Controllable Text-To-Image Generation