Accomplishments

Semantic-Driven Facial Image Synthesis Using Enhanced Prompt Engineering and CLIP Guidance


  • Details
  • Share
Category
Conference
Authors
Abhijit Patil , Kartik Deshmukh & Divya Ghanekar
Conference Name
IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)
Conference From
30-Jan-2026
Conference To
31-Jan-2026
Conference Venue
Bhopal, India
  • Abstract

This paper presents a training-free baseline framework for text-to-face generation using pretrained Stable Diffusion and CLIP models. No additional model training or fine-tuning is performed. The proposed approach focuses on facial-specific prompt-engineering strategies and CLIP-based semantic evaluation to study text–image alignment in facial synthesis. Experimental evaluation on a limited set of facial descriptions demonstrates consistent semantic similarity with a CLIP score of 0.651, with an average generation time of under 15 seconds per image. Rather than competing with state-of-the-art methods, this work is intended as a reproducible baseline to support further research in semantic-driven facial image synthesis.

© Somaiya 2026 / All rights reserved.
Get in Touch