Semantic-Driven Facial Image Synthesis Using Enhanced Prompt Engineering and CLIP Guidance

Back to Accomplishments

Accomplishments

Semantic-Driven Facial Image Synthesis Using Enhanced Prompt Engineering and CLIP Guidance

Details
Share

Category

Conference

Authors

Abhijit Patil , Kartik Deshmukh & Divya Ghanekar

Conference Name

IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

Conference From

30-Jan-2026

Conference To

31-Jan-2026

Conference Venue

Bhopal, India

Abstract

This paper presents a training-free baseline framework for text-to-face generation using pretrained Stable Diffusion and CLIP models. No additional model training or fine-tuning is performed. The proposed approach focuses on facial-specific prompt-engineering strategies and CLIP-based semantic evaluation to study text–image alignment in facial synthesis. Experimental evaluation on a limited set of facial descriptions demonstrates consistent semantic similarity with a CLIP score of 0.651, with an average generation time of under 15 seconds per image. Rather than competing with state-of-the-art methods, this work is intended as a reproducible baseline to support further research in semantic-driven facial image synthesis.