About Me

🎓 Ph.D. | M.Tech | CSE | IIT Guwahati 🎓
Senior AI Researcher and Practitioner specializing in NLP, Audio Intelligence, and Multimodal Learning - with proven impact across Telecom, Healthcare, Sports, and Energy sectors.
Focused on delivering efficient, foundation-model-driven AI systems, optimized for real-world deployment and scalability.
Hands-on expertise with LLMs, lightweight language models, efficient audio transformers, and MLOps pipelines for end-to-end AI delivery.
Collaborating with cross-functional teams to translate models into business value - delivering insights, automation, and trust in AI systems.
Research focus on speech processing, audio deepfake detection, anti-spoofing, and behavioral signal analysis - bridging cutting-edge innovation with real-world deployment.
Track record of high-impact publications (ACL, ICASSP, INTERSPEECH, ECML-PKDD, IJCNN) and award-winning research, including real-time cricket analytics using NLP.
News and Activities
- May 20, 2025: Seven papers accepted at INTERSPEECH 2025.
- May 16, 2025: One paper accepted at ACL 2025.
- May 20, 2025: Presented a paper at ICASSP 2025.
Areas of Expertise
- NLP: Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Seq-to-Seq Tasks, Cross-Lingual Tasks, NLU, NLG, Conversational AI, and Chatbots.
- Audio: Audio Language Models (ALMs), ASR, Audio Classification, Audio Captioning, Audio Retrieval, Audio Question Answering (QA), and Environmental Sound Analysis.
- Computer Vision: Vision Language Models (VLMs), Image Classification, Image Captioning, Image Retrieval, Visual QA, and Audio-Visual QA.
Experience
- Senior Data Scientist, ExxonMobil (EMSTPL), Bengaluru, India (Nov 2024 - present)
- Senior Research Scientist, Reliance Jio AICoE, Hyderabad, India (Sep 2021 - Oct 2024)
- Projects: Call Audit Automation, Patient Notes Conversion, Indic LLM, Agriculture RAG and LLM, Art VLM, Agriculture Time Series Analysis, PDF Voicebot, Hospital Voicebot, Aspect-based Sentiment Analysis, Contract Review AI.
- Research Intern, Reliance Jio AICoE, Hyderabad, India (June 2021 - Aug 2021)
- Projects: Cricket Analytics for Mumbai Indians.
- Teaching Assistant, IIT Guwahati, India (July 2013 - Dec 2020)
- Courses: Software Engineering (Fall-2018, Fall-2019), Design and Analysis of Algorithms (Fall-2017), Computer Vision using Machine Learning (Fall-2016), Discrete Mathematics (Fall-2015), Probability and Linear Algebra (Spring-2014), Data Communication (Fall-2013).
- Labs: Database (Spring-2015, Spring-2016, Spring-2020, Fall-2020), Computing (Spring-2017, Spring-2018, Spring-2019), Data Structures (Fall-2014).
Publications
-
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
O. C. Phukan, D. Singh, S. R. Behera*, A. B. Buduru, R. Sharma (*2nd Author)
ACL, 2025
[paper]
[code]
-
Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, S. R. Behera*, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma (*2nd Author)
INTERSPEECH, 2025
[paper]
-
Towards Source Attribution of Singing Voice Deepfake with Multimodal Foundation Models
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, S. R. Behera*, Priyabrata Mallick, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma (*2nd Author)
INTERSPEECH, 2025
[paper]
-
PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, S. R. Behera*, Jaya Sai Kiran Patibandla, Arun Balaji Buduru, Rajesh Sharma (*2nd Author)
INTERSPEECH, 2025
[paper]
-
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, S. R. Behera*, Pailla Balakrishna Reddy, Arun Balaji Buduru, Rajesh Sharma (*2nd Author)
INTERSPEECH, 2025
[paper]
-
SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, S. R. Behera*, Abu Osama Siddiqui, Sarthak Jain, Priyabrata Mallick, Jaya Sai Kiran Patibandla, Pailla Balakrishna Reddy, Arun Balaji Buduru (*2nd Author)
INTERSPEECH, 2025
[paper]
-
Towards Machine Unlearning for Paralinguistic Speech Processing
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Shubham Singh, S. R. Behera*, Vandana Malayil, Muskaan Singh, Arun Balaji Buduru, Rajesh Sharma (*3rd Author)
INTERSPEECH, 2025
[paper]
-
Towards Fusion of Neural Audio Codec-based Representations with Spectral for Heart Murmur Classification via Bandit-based Cross-Attention Mechanism
Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, S. R. Behera*, Priyabrata Mallick, Santanu Roy, Arun Balaji Buduru, Rajesh Sharma (*2nd Author)
INTERSPEECH, 2025
[paper]
-
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
O. C. Phukan, M. M. Akhtar, Girish, S. R. Behera*, S. Kalita, A. B. Buduru, R. Sharma, S.R. M. Prasanna (*3rd Author)
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)
[paper]
[code]
-
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
O. C. Phukan, D. Koshal, S. R. Behera*, A. B. Buduru, R. Sharma (*2nd Author)
Arxiv
[paper]
[code]
-
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
S. Jain, O. C. Phukan, S. R. Behera*, A. B. Buduru, R. Sharma (*1st Author)
Arxiv
[paper]
[code]
-
Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals
O. C. Phukan, S. R. Behera*, M. M. Akhtar, A. B. Buduru, R. Sharma (*1st Author)
Arxiv
[paper]
[code]
-
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
O. C. Phukan, Girish, M. M. Akhtar, S. R. Behera*, N. Choudhury, A. B. Buduru, R. Sharma, S.R. M. Prasanna (*authors contributed equally) (*1st Author)
Arxiv
[paper]
[code]
-
Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection
O. C. Phukan, S. R. Behera*, S. Singh, M. Singh, V. Rajan, A. B. Buduru, R. Sharma, S.R. M. Prasanna (*2nd Author)
Arxiv
[paper]
[code]
-
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
O. C. Phukan, S. Jain, S. R. Behera*, A. B. Buduru, R. Sharma, S.R. M. Prasanna (*2nd Author)
Arxiv
[paper]
[code]
-
Towards Multilingual Audio-Visual Question Answering
O. C. Phukan, P. Mallick, S. R. Behera*, A. S. Narayani, A. B. Buduru, R. Sharma (*1st Author)
Conference of the International Speech Communication Association (INTERSPEECH), 2024
[paper]
[code]
-
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
S. R. Behera, A. Dhiman, K. Gowda, A. S. Narayani
Conference of the International Speech Communication Association (INTERSPEECH), 2024
[paper]
[code]
-
Spectral Clustering in Convex and Constrained Settings
S. R. Behera and V. S. Vedula
Arxiv
[paper]
[code]
-
Visualization of Unstructured Sports Data-An Example of Cricket Short Text Commentary
S. R. Behera and V. S. Vedula
Arxiv
[paper]
[code]
-
Cricket Player Profiling: Unraveling Strengths and Weaknesses Using Text Commentary Data
S. R. Behera and V. S. Vedula
Arxiv
[paper]
[code]
-
AQA-LLM: A Scalable Automated AQA Data Generation Framework Using Large Language Model
S. R. Behera, K. M. Injeti, J. S. K. Patibandla, P. K. Pokala, A. M. Tripathi, P. B. Reddy
Arxiv
[paper]
[code]
-
Towards Multi-Lingual Audio Question Answering
S. R. Behera, P. B. Reddy, A. M. Tripathi, B. R. Megavath, and T. Karavadi
Conference of the International Speech Communication Association (INTERSPEECH), 2023
[paper]
[code]
-
Reverse Adversarial Attack To Enhance Environmental Sound Classification
A. M. Tripathi, S. R. Behera, and K. Paul
IEEE International Joint Conference on Neural Networks (IJCNN), 2022
[paper]
[code]
-
K-Defensive Bit Planes: Defense Against Adversarial Attacks
A. M. Tripathi, S. R. Behera, and K. Paul
IEEE International Joint Conference on Neural Networks (IJCNN), 2022
[paper]
[code]
-
Investigation of Performance of Visual Attention Mechanisms for Environmental Sound Classification: A Comparative Study
A. M. Tripathi, S. R. Behera, and K. Paul
IEEE International Joint Conference on Neural Networks (IJCNN), 2022
[paper]
[code]
-
Adv-IFD: Adversarial Attack Datasets for An Intelligent Fault Diagnosis
A. M. Tripathi, S. R. Behera, and K. Paul
IEEE International Joint Conference on Neural Networks (IJCNN), 2022
[paper]
[code]
-
Learning Player-specific Strategies Using Cricket Text Commentary
S. R. Behera
PhD Thesis, 2021
[phd thesis]
-
Mining Temporal Changes in Strengths and Weaknesses of Cricket Players Using Tensor Decomposition
S. R. Behera and V. S. Vedula
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2020
[paper]
[code]
-
Learning Strength and Weakness Rules of Cricket Players using Association Rule Mining
S. R. Behera and V. S. Vedula
Machine Learning and Data Mining for Sports Analytics (MSLA), ECML-PKDD Workshop, 2021
[paper]
[code]
-
Performance Analysis of Batsman against Spin Bowling and Fast Bowling in Cricket
S. R. Behera
Ohio State Sports Analytics Association Conference (OSUSAAC), 2020
[paper]
[code]
*Best Research Award*
-
Stats Aren't Everything; Learning Strengths and Weaknesses of Cricket Players
S. R. Behera and V. S. Vedula
Machine Learning and Data Mining for Sports Analytics (MSLA), ECML-PKDD Workshop, 2020
[paper]
[code]
-
Video Data Do More. Tracking Data Do Much. Text Commentary Data Do Much More
S. R. Behera and V. S. Vedula
Carnegie Mellon Sports Analytics Conference (CMSAC), 2020
[paper]
[code]
-
Mining Strengths and Weaknesses of Cricket Players Using Short Text Commentary
S. R. Behera, P. Agrawal, A. Awekar and V. S. Vedula
IEEE International Conference On Machine Learning And Applications (ICMLA), 2019
[paper]
[code]
Web Applications
Education
- PhD in Computer Science and Engineering, IIT Guwahati, India, July 2015 - Sept 2021
- Thesis: Learning Player-specific Strategies using Cricket Text Commentary.
- M.Tech in Computer Science and Engineering, IIT Guwahati, India, July 2013 - June 2015
- Thesis: Spectral Clustering Using Convex and Constrained Settings.
- B.Tech in Computer Science and Engineering, VSSUT, Burla, India, July 2008 - June 2012
- Thesis: A Novel Ontology Based Entity Relationship Model.
Programming Skills
- Languages: Python, R, C, Matlab, SQL.
- Others: PyTorch, FastText, spaCy, Flair, AllenNLP, TextBlob, Core NLP, Gensim, NLTK, Huggingface, Fairseq, Pandas, NumPy, SciPy, Scikit-learn, Seaborn, Matplotlib, Plotly, R Shiny.
Miscellaneous
- Best Research Award: Ohio State Sports Analytics Association Conference (OSUSAAC), 2020, Columbus, USA.
- GATE 2013: All India Rank 696 (99.68 percentile).
- Program Committee Member: ECML-PKDD 2020.
- Reviewer: ECML-PKDD, IEEE VIS, TASLP, ICASSP, WACV, ICME.
- Grants and Fellowships: MHRD Government of India Fellowship for MTech and PhD.
- Organizer: Advaya 2015, PG cultural festival at IIT Guwahati.
- Technical Officer: Student gymkhana council 2014-2015 at IIT Guwahati.
- Email: swarupranjanbehera@gmail.com
- Address: Hitech City, Hyderabad, India