I was honored to represent the International Society of Data Scientists (ISODS), presenting our latest paper research “Facial Beauty Prediction with Vision Transformer” at the Conference on Information Technology and its Applications (CITA) 2023. The event not only marks our first publication from the ISODS, but our opportunity to reach out towards Data Scientists and AI practitioners enthusiasts.
Behind this work, our research is a collaborative effort of data scientists from different academic and industrial backgrounds. I have learnt most from our team not just the technicality, but also the leadership and advocating of people that worked from different backgrounds. Our team consist of scholars from Academia, With PhDs and Postdocs who have a penchant for research and a knack for churning insights from numbers; Industry Veterans and mentors with years of practical experience, they ensure our research stays relevant and applicable; undergraduate students that challenge traditional thinking, infusing our team with novel approaches and ideas. This diverse mix ensures that our papers are not just theoretically sound but are also grounded in reality, ready for practical deployment.
In my presentation, we explore the key concepts, decisions, methods, and outcomes of our research.
Our motivation is to improve the implementation of this computer vision problem into industrial use. So we require that our model not only reflect the industries’ standard beauty by focusing on beauty aspects defined by the industries, we must also be able to allow users to interpret its selected features, and be able to highlight the areas that the users need to improve upon.
We have experimented with various methods (both CNN and transformer-based methods, or combinations of both) before deciding on using the Vision Transformer (ViT) for our research (introduced at Google Research in a paper titled "An Image is Worth 16x16 Words.“). By choosing ViT, we capture both local and global contextual information in images and require fewer computational resources compared to state-of-the-art CNNs. But more importantly, by choosing a single transformer model for this task, we hope to enhance interpretability but also have the potential to scale up for broader applications.
Diving a bit deeper, we explained how using a pretrained model on a distinct dataset allows us to extract meaningful features through the last layer of the pretrained ViT model. These features are then used to train on a new task—assessing facial beauty. This approach enhances the model's robustness and adaptability, thus our model resulted in better accuracy and better scores than most CNN models for the task.
Beyond our paper presentation, the conference offered incredible opportunities to connect with other professionals in the field. From casual conversations to formal meetings, the interactions allowed us to exchange ideas, explore collaboration possibilities, and gain insights into the latest trends and innovations in data science. I talked with professors and researchers; they opened for me new possibilities of multiple approaches that can leverage the Transformer architecture further. But these approaches aren’t the only experiences I got as a presenter for ISODS.
One of the key values from CITA was that the conference attracted collaborations and publications from not just VKU students, but also scholars and researchers from various institutions of east Asia and Europe. One salient aspect reflected from multiple speakers and presenters (from data science, computer science and AI topics) was the shared focus on developing models aimed at solving current problems that emphasis on human-centric modeling. We can see various researches aimed at enhancing the interpretability of models or optimizing the tailoring systems for practical industrial applications. These initiatives not only offer immediate solutions for a lot of topics, they also extend to future work in emerging areas like generative AI and self-driving cars.
So, what's next for us? For our model to be truly human-centric, it must be inclusive and adaptable for everyone. Before we roll out our model as a fully tailored system, it's essential that we made our model become user-friendly. That is, our model has to be adaptable to people of different races and ethnicities.
One potential avenue for our future work is to explore alternative transformer variants, such as SWIN Transformer, which has shown promising results in other computer vision tasks. Additionally, it would also be interesting to investigate whether DeiT (Data-efficient Image Transformers) could achieve even higher accuracy for face beauty evaluation than ViT.
Another promising topic to explore would be implementing ensembling methods for combining multiple models. By integrating both ViT with established CNN-based architectures like ResNet or VGG, we propose the new model could achieve even higher accuracy for similar tasks.
Currently, our team is considering the development and publication of models pre-trained exclusively on facial datasets. While this project would demand significant computational resources, it has the potential to open up new frontiers for more extensive research. For every new possibility, we save it for future projects that’s more to come.
Participating in the Conference on Information Technology and its Applications (CITA) 2023 has been an enriching and humbling experience for me. Presenting our paper to such a knowledgeable audience was both an honor and a learning opportunity. The dialogues, connections, and insights gained during the conference will undoubtedly fuel our research and inspire our work for years to come.
Moreover, our organization is uniquely positioned to not only contribute academically but also to bridge the gap between research and real-world applications. Given our blend of academia and industry professionals, we aim to bring more solutions that anyone in the data science communities can deploy to current work problems.
ISODS is a Massachusetts professional non-profit organization for Data Science and AI practitioners and researchers, who apply Data Science and AI at work. We have held professional exams and competitions for multiple years, and this is our first publication at the conference.
Our CITA presentation is a presentation to what diverse teams, dedication to the cause, and an unwavering commitment to excellence can achieve. As we share our findings with the global community, we are also setting the stage for further contributions that our organization can make in the realm of data science. The journey, with all its challenges and triumphs, has been incredibly rewarding. We are excited about the road ahead and are deeply committed to pushing the boundaries of what's possible in data science.
(I’d love to invite everyone interested in our paper to download it here and engage with us for any questions, feedback, or collaboration.)