The goal of this project is to utilize LLMs and text-to-image models to generate social stories, a useful tool for helping children with autism spectrum disorder learn. This is needed because despite their usefulness, they are not utilized due to time burden imposed on therapists to make them.
On the technical side, this focuses on enhancing zero-shot image generation to ensure consistent scene and character generation. This approach combines Stable Diffusion, DreamBooth, and textual inversion with cross-attention control and ChatGPT prompting. The results exhibit superior performance compared to the existing state-of-the-art StoryDALL-E model, both quantitatively and qualitatively in terms of consistency and interpretability, while staying lightweight, expressive, and personalizable for therapists.
I contributed to all aspects of the project, whether it was ideation, implementation, and developing evaluation metrics.
Ensemble Workflow