Last updated: 08 Apr 2026
For complete citation metrics, publications, and updated statistics, please visit my Google Scholar profile .

Selected Publications

AI Alignment

LLM Alignment should go beyond Harmlessness–Helpfulness and incorporate Human Agency
Usman Naseem, Tanmoy Chakraborty, Kai-Wei Chang, Mark Dras, Preslav Nakov, Nanyun Peng & Soujanya Poria
Cognitive Computation, 2026
Do Large Language Models Reflect Demographic Pluralism in Safety?
Usman Naseem, Gautam Siddharth Kashyap, Sushant Kumar Ray, Rafiq Ali, Ebad Shabbir, Abdullah Mohammad
EACL 2026
Are Aligned Large Language Models Still Misaligned?
Usman Naseem, Gautam Siddharth Kashyap, Rafiq Ali, Ebad Shabbir, Sushant Kumar Ray, Abdullah Mohammad
arXiv:2602.11305 (2026)
Can Large Language Models Make Everyone Happy?
Usman Naseem, Gautam Siddharth Kashyap, Ebad Shabbir, Sushant Kumar Ray, Abdullah Mohammad, Rafiq Ali
arXiv:2602.11091 (2026)
A Survey of Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji, Yanqiu Wu, Zhibin Wu, Shoujin Wang, Jian Yang, Mark Dras, Usman Naseem
IEEE Transactions on Artificial Intelligence, 2026

Mechanistic Interpretability

Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions
Usman Naseem
Preprint, 2026
Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs
Pranav Bhandari, Nicolas Fay, Sanjeevan Selvaganapathy, Amitava Datta, Usman Naseem, Mehwish Nasim
EACL 2026
Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA
Yiran Zhang, Ming Lin, Mark Dras, Usman Naseem
AAAI 2026 (Demo)
VISPA: Pluralistic Alignment via Automatic Value Selection and Activation
Shenyan Zheng, Jiayou Zhong, Anudeex Shetty, Heng Ji, Preslav Nakov, Usman Naseem
arXiv:2601.12758 (2026)
CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models
Yiran Zhang, J Hu, Mark Dras, Usman Naseem
arXiv:2512.14118 (2025)

NLP Applications & Social Good

From Native Memes to Global Moderation: Cross-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection
Mo Wang, Kaixuan Ren, Pratik Jalan, Ahmed Ashraf, Tuong Vy Vu, Rahul Seetharaman, Shah Nawaz, Usman Naseem
The Web Conference (WebConf) 2026
Robust Harmful Meme Detection under Missing Modalities via Shared Representation Learning
Felix Breiteneder, Mohammad Belal, Muhammad Saad Saeed, Shahed Masoudian, Usman Naseem, Kulshrestha Juhi, Markus Schedl, Shah Nawaz
The Web Conference (WebConf) 2026
Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes
Gautam Siddharth Kashyap, H Joshi, N Jain, Ebad Shabbir, J Gao, N Joshi, Usman Naseem
EACL 2026
Health-ORSC-Bench: A Benchmark for Measuring Over-Refusal and Safety Completion in Health Context
Z Zhang, L Huang, G Wu, Preslav Nakov, H Ji, Usman Naseem
arXiv:2601.17642 (2026)