Main Conference
- LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts
Yang Liu; Jiaye Yang; Weikang Li; Jiahui Liang; Yang Li; Lingyong Yan
- Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Yuxuan Zhu; Antony Kellermann; Akul Gupta; Philip Li; Richard Fang; Rohan Bindu; Daniel Kang
- Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?
Jingwei Ni; Yu Fan; Vilém Zouhar; Donya Rooein; Alexander Miserlis Hoyle; Mrinmaya Sachan; Markus Leippold; Dirk Hovy; Elliott Ash
- Early-Exit and Instant Confidence Translation Quality Estimation
Vilém Zouhar; Maike Züfle; Beni Egressy; Julius Cheng; Mrinmaya Sachan; Jan Niehues
- GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval
Justus-Jonas Erker; Nils Reimers; Iryna Gurevych
- SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
Gyubeum Lim; Yemo Koo; Vijay Krishna Madisetti
- Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis
Shuhaib Mehri; Xiusi Chen; Heng Ji; Dilek Hakkani-Tür
- Investigating the Multilingual Calibration Effects of Language Model Instruction Tuning
Jerry Huang; Peng Lu; QIUHAO Zeng; Yusuke Iwasawa; Yutaka Matsuo; Sarath Chandar; Edison Marrese-Taylor; Irene Li
- $T^2$-RAGBench: Text-and-Table Aware Retrieval-Augmented Generation
Jan Strich; Enes Kutay Isgorur; Maximilian Trescher; Chris Biemann; Martin Semmann
- The Pragmatic Mind of Machines: Tracing the Emergence of Pragmatic Competence in Large Language Models
Kefan Yu; Qingcheng Zeng; Weihao Xuan; Wanxin Li; Jingyi Wu; Rob Voigt
- Hierarchical Text Classification with LLM-Refined Taxonomies
Jonas Golde; Nicolaas Paul Jedema; RaviKiran Krishnan; Phong Le
- Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning
Chengsong Huang; Langlin Huang; Jiaxin Huang
- Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
Sarah Ball; Frauke Kreuter; Nina Panickssery
- Out of Style: RAG’s Fragility to Linguistic Variation
Tianyu Cao; Neel Bhandari; Akhila Yerukola; Akari Asai; Maarten Sap
- Do Political Opinions Transfer Between Western Languages? An Analysis of Unaligned and Aligned Multilingual LLMs
Franziska Weeber; Tanise Ceron; Sebastian Padó
- H-MEM: Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents
Haoran Sun; Shaoning Zeng; Bob Zhang
- MULSUM: A Multimodal Summarization System with Vis-Aligner and Diversity-Aware Image Selection
Abid Ali; Diego Molla; Usman Naseem
- How Quantization Shapes Bias in Large Language Models
Federico Marcuzzi; Xuefei Ning; Roy Schwartz; Iryna Gurevych
- If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
Jasmin Orth; Philipp Mondorf; Barbara Plank
- The Dog the Cat Chased Stumped the Model: Measuring When Language Models Abandon Structure for Shortcuts
Sangmitra Madhusudan; Kaige Chen; Ali Emami
- Automated Screening of Antibacterial Nanoparticle Literature: Dataset Curation and Model Evaluation
Alperen Ozturk; Şaziye Betül Özateş; Sophia Bahar Root; Angela Violi; Nicholas Kotov; J. Scott VanEpps; Emine Sumeyra Turali Emre
- Intention Knowledge Graph Construction for User Intention Relation Modeling
Jiaxin Bai; Zhaobo Wang; Junfei Cheng; Dan Yu; Zerui Huang; Weiqi Wang; Xin Liu; Chen Luo; Yanming Zhu; Bo Li; Yangqiu Song
- Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction
Chunyang Jiang; Paola Merlo
- JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human risky health behavior Content in Jirai Community
Yunze Xiao; Tingyu He; Lionel Z. WANG; Yiming Ma; Xingyu Song; Xiaohang Xu; Mona T. Diab; Irene Li; Ka Chung Ng
- Chandomitra: Towards Generating Structured Sanskrit Poetry from Natural Language Inputs
Manoj Balaji Jagadeeshan; Samarth Bhatia; Pretam Ray; Harshul Raj Surana; Akhil Rajeev P; PRIYA MISHRA; ANNARAO KULKARNI; Ganesh Ramakrishnan; Prathosh AP; Pawan Goyal
- Tailored Emotional LLM-Supporter: Enhancing Cultural Sensitivity
Chen Cecilia Liu; Hiba Arnaout; Nils Kovačić; Dana Atzil-Slonim; Iryna Gurevych
- Detecting Subtle Sense Shift with Polysemy-Aware Trends
Ondřej Herman; Pavel Rychlý
- Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs
Hussein Abdallah; Ibrahim Abdelaziz; Panos Kalnis; Essam Mansour
- Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models
David Guzman Piedrahita; Irene Strauss; Rada Mihalcea; Zhijing Jin
- PromptFE: Automated Feature Engineering by Prompting
Yufeng Zou; Jean Utke; Diego Klabjan; Han Liu
- Detecting (Un)answerability in Large Language Models with Linear Directions
Maor Juliet Lavi; Tova Milo; Mor Geva
- Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning
Sanghwan Bae; Jiwoo Hong; Min Young Lee; Hanbyul Kim; jeongyeon nam; Donghyun Kwak
- BERT, are you paying attention? Attention regularization with human-annotated rationales
Elize Herrewijnen; Dong Nguyen; Floris Bex; Albert Gatt
- Humans and transformer LMs: Abstraction drives language learning
Jasper Jian; Christopher D Manning
- BigTokDetect: A Clinically‑Informed Vision–Language Model Framework for Detecting Pro‑Bigorexia Videos on TikTok
Minh Duc Chu; Kshitij Pawar; Zihao He; Roxanna Sharifi; Ross M. Sonnenblick; Magdalayna Curry; Laura DAdamo; Lindsay Young; Stuart Murray; Kristina Lerman
- Do language models accommodate their users? A study of linguistic convergence
Terra Blevins; Susanne Schmalwieser; Benjamin Roth
- Auditing Language Model Unlearning via Information Decomposition
Anmol Goel; Alan Ritter; Iryna Gurevych
- Logic Haystacks: Probing LLMs’ Long-Context Logical Reasoning (Without Easily Identifiable Unrelated Padding)
Damien Sileo
- OD-Stega: LLM-Based Relatively Secure Steganography via Optimized Distributions
Yu-Shin Huang; Peter Just; Hanyun Yin; Krishna Narayanan; Ruihong Huang; Chao Tian
- When Does Auxiliary Modality Matter in Solving Geometric Problems? A Comprehensive Study of Textual, Formal, and Visual Modalities
Hyuk Namgoong; Jeesu Jung; Yerim Han; Sangkeun Jung
- IYKYK: Using language models to decode extremist cryptolects
Christine de Kock; Arij Riabi; Zeerak Talat; Michael Sejr Schlichtkrull; Pranava Madhyastha; Eduard Hovy
- Sparse Adapter Fusion for Continual Learning in NLP
Min Zeng; Xi Chen; Haiqin Yang; Yike Guo
- Rethinking Prompt Optimizers: From Prompt Merits to Optimization
Zixiao Zhu; Hanzhang Zhou; Zijian Feng; Tianjiao Li; Chua Jia Jim Deryl; Lee Onn Mak; Gee Wah Ng; Kezhi Mao
- A Survey on Multilingual Mental Disorders Detection from Social Media Data
Ana-Maria Bucur; Marcos Zampieri; Tharindu Ranasinghe; Fabio Crestani
- Identifying Fine-grained Forms of Populism in Political Discourse: A Case Study on Donald Trump's Presidential Campaigns
Ilias Chalkidis; Stephanie Brandl; Paris Aslanidis
- SCoNE: a Self-Correcting and Noise-Augmented Method for Complex Biological and Chemical Named Entity Recognition
Xingyu Zhu; Claire Nédellec; Balazs Nagy; Laszlo Vidacs; Robert Bossy
- A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models
Iwona Christop; Mateusz Czyżnikiewicz; Paweł Skórzewski; Łukasz Bondaruk; Jakub Kubiak; Marcin Lewandowski; Marek Kubis
- CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter; Shrimai Prabhumoye; Matvei Novikov; Seungju Han; Ying Lin; Evelina Bakhturina; Eric Nyberg; Yejin Choi; Mostofa Patwary; Mohammad Shoeybi; Bryan Catanzaro
- Safety of Large Language Models Beyond English: A Systematic Literature Review of Risks, Biases, and Safeguards
Aleksandra KrasnodД™bska; Katarzyna Dziewulska; Karolina Seweryn; Maciej Chrabaszcz; Wojciech Kusa
- InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Yuhang Liu; Pengxiang Li; Zishu Wei; Congkai Xie; Xueyu Hu; Xinchen Xu; Shengyu Zhang; Xiaotian Han; Hongxia Yang; Fei Wu
- Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish
Yakup Abrek Er; Ilker Kesen; Gözde Gül Şahin; Aykut Erdem
- CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation
Bastien Liétard; Gabriel Loiseau
- Do NOT Classify and Count: Hybrid Attribute Control Success Evaluation
Felix Matthias Saaro; Pius von Däniken; Mark Cieliebak; Jan Milan Deriu
- Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim; Yang Li; Evangelia Spiliopoulou; Jie Ma; Miguel Ballesteros; William Yang Wang
- How effective are VLMs in assisting humans in inferring the quality of mental models from Multimodal short answers?
Pritam Sil; DURGAPRASAD KARNAM; Vinay Reddy Venumuddala; Pushpak Bhattacharyya
- Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism
Simon Münker; Nils Schwager; Achim Rettinger
- Persona Prompting as a Lens on LLM Social Reasoning
Jing Yang; Moritz Hechtbauer; Elisabeth Khalilov; Evelyn Luise Brinkmann; Vera Schmitt; Nils Feldhus
- PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media
Michele Joshua Maggini; Paloma Piot; Anxo Pérez; Erik Bran Marino; Lúa Santamaría Montesinos; Ana Lisboa Cotovio; Marta Vázquez Abuín; Javier Parapar; Pablo Gamallo
- Progressive Visual Refinement for Multi-modal Summarization
Ye Xiong; Hidetaka Kamigaito; Soichiro Murakami; Peinan Zhang; Hiroya Takamura; Manabu Okumura
- Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
Zhenglin Wang; Jialong Wu; Pengfei LI; Yong Jiang; Deyu Zhou
- Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition
Lei Xu; Pierre Beckmann; Marco Valentino; Andre Freitas
- Lexical Popularity: Quantifying the Impact of Pre-training for LLM Performance
Elena Sofia Ruzzetti; Fabio Massimo Zanzotto; Tommaso Caselli
- Training in Step-by-Step Formal Reasoning Improves Pronominal Reasoning in Language Models
Vagrant Gautam
- Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification
Paul He; Yinya Huang; Mrinmaya Sachan; Zhijing Jin
- When Words Wear Masks: Detecting Malicious Intents and Hostile Impacts of Online Hate Speech
Priyansh Singhal; Piyush Joshi
- CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
Punya Syon Pandey; Yongjin Yang; Jiarui Liu; Zhijing Jin
- Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns; Letitia Parcalabescu; Stephan Waeldchen; Michael Barlow; Gregor Ziegltrum; Volker Stampa; Bastian Harren; Björn Deiseroth
- Ultra-Low-Dimensional Prompt Tuning via Random Projection
Zijun Wu; Yongchang Hao; Lili Mou
- NP-Hard Lower Bound Complexity for Semantic Self-Verification
Robin Young
- STAMP: Selective Task-Aware Mechanism for Text Privacy
Fengwei Tian; Payel Bhattacharjee; Heidi Hanson; Geoffrey D Rubin; Joseph Y. Lo; Ravi Tandon
- Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities
Alberto Purpura; Li Wang; Sahil Badyal; Gene Beaufrand; Adam Faulkner
- Utterance-level Detection Framework for LLM-Involved Content Detection in Conversational Setting
Muyang Zhou; Huaxia Rui
- ClinSQL: A Challenging Benchmark for Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL
Yifei Shen; Yilun Zhao; Justice Ou; Tinglin Huang; Arman Cohan
- Lost in Activations: A Neuron-level Analysis of Encoders for Cross-Lingual Emotion Detection
Pranaydeep Singh; Orphee De Clercq; Els Lefever
- iBERT: Interpretable Embeddings via Sense Decomposition
Vishal Anand; Milad Alshomary; Kathleen McKeown
- Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World
Vinu Sankar Sadasivan; Soheil Feizi; Rajiv Mathews; Lun Wang
- Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework
Clea Chataigner; Rebecca Ma; Prakhar Ganesh; Yuhao Chen; Afaf Taik; Elliot Creager; Golnoosh Farnadi
- AutoBool: Reinforcement-Learned LLM for Effective Automatic Systematic Reviews Boolean Query Generation
Shuai Wang; Harrisen Scells; Bevan Koopman; Guido Zuccon
- McMining: Automated Discovery of Misconceptions in Student Code
Erfan Al-Hossami; Razvan Bunescu
- Improving LLM Domain Certification with Pretrained Guide Models
Jiaqian Zhang; Zhaozhi Qian; Faroq AL-Tam; Ignacio Iacobacci; Muhammad AL-Qurishi; Riad Souissi
- TDFlow: Agentic Workflows for Test Driven Development
Kevin Han; Siddharth Maddikayala; Tim Knappe; Om Patel; Austen Liao; Amir Barati Farimani
- Contrastive Learning with Narrative Twins for Modeling Story Salience
Igor Sterner; Alex Lascarides; Frank Keller
- ExAnte: A Benchmark for Ex-Ante Inference in Large Language Models
Yachuan Liu; Xiaochun Wei; Lin Shi; Xinnuo Li; Bohan Zhang; Paramveer Dhillon; Qiaozhu Mei
- CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection
Grace Byun; Rebecca Lipschutz; SEAN T. MINTON; Abigail Powers; Jinho D. Choi
- Surprisal from Larger Transformer-based Language Models Predicts fMRI Data More Poorly
Yi-Chien Lin; William Schuler
- Coordinates from Context: Using LLMs to Ground Complex Location References
Tessa Masis; Brendan O'Connor
- Discourse Graph Guided Document Translation with Large Language Models
Viet Thanh Pham; Minghan Wang; Hao-Han Liao; Thuy-Trang Vu
- StarFlow: Generating Structured Workflow Outputs From Sketch Images
Patrice Bechard; Chao Wang; Amirhossein Abaskohi; Juan A. Rodriguez; Christopher Pal; David Vazquez; Spandana Gella; Sai Rajeswar; Perouz Taslakian
- Adaptive Helpfulness–Harmlessness Alignment with Preference Vectors
Ren-Wei Liang; Chin Ting Hsu; Chan-Hung Yu; Saransh Agrawal; Shih-Cheng Huang; Chieh-Yen Lin; Shang-Tse Chen; Kuan-Hao Huang; Shao-Hua Sun
- How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains
Reza Khanmohammadi; Erfan Miahi; Simerjot Kaur; Charese Smiley; Ivan Brugere; Kundan S Thind; Mohammad M. Ghassemi
- WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
Zhisong Zhang; Tianqing Fang; Kaixin Ma; Wenhao Yu; Hongming Zhang; Haitao Mi; Dong Yu
- SearchLLM: Detecting LLM Paraphrased Text by Measuring the Similarity with Regeneration of the Candidate Source via Search Engine
Hoang-Quoc Nguyen-Son; Minh-Son Dao; Koji Zettsu
- RoZO: Geometry-Aware Zeroth-Order Fine-Tuning on Low-Rank Adapters for Black-Box Large Language Models
Zichen Song; Weijia Li
- Mitigating Degree Bias in Hypergraphs via Attribute-as-Structure Approach
Ryusei Nishide; Makoto Miwa
- Generative Personality Simulation via Theory-Informed Structured Interview
Pengda Wang; Huiqi Zou; Han Jiang; Hanjie Chen; Tianjun Sun; Xiaoyuan Yi; Ziang Xiao; Frederick L. Oswald
- Unraveling LLM Jailbreaks Through Safety Knowledge Neurons
Chongwen Zhao; Yutong Ke; Kaizhu Huang
- Hacking Neural Evaluation Metrics with a Single Text
Hiroyuki Deguchi; Katsuki Chousa; Yusuke Sakai
- ELLA: Efficient Lifelong Learning for Adapters in Large Language Models
Shristi Das Biswas; Yue Zhang; Anwesan Pal; Radhika Bhargava; Kaushik Roy
- To Paraphrase or Not: Efficient Comment Detoxification with Unsupervised Detoxifiability Discrimination
Jing Ke; Zheyong Xie; Shaosheng Cao; Tong Xu; Enhong Chen
- LingGen: Linguistic Fine-grained Controlled Generation
Mohamed Elgaar; Hadi Amiri
- Hey, wait a minute: on at-issue sensitivity in Language Models
Sanghee J. Kim; Kanishka Misra
- RECIPE-TKG: From Sparse History to Structured Reasoning for LLM-based Temporal Knowledge Graph Completion
Ömer Faruk Akgül; Feiyu Zhu; Yuxin Yang; Rajgopal Kannan; Viktor Prasanna
- Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth
Michelle Yuan; Weiyi Sun; Amir H. Rezaeian; Jyotika Singh; SANDIP GHOSHAL; Yao-Ting Wang; Miguel Ballesteros; Yassine Benajiba
- PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR
James Burgess; Jan N. Hansen; Duo Peng; Yuhui Zhang; Alejandro Lozano; Min Woo Sun; Emma Lundberg; Serena Yeung-Levy
- Exploring Speaker Anonymization Methods for Low-Resource Text-to-Speech
Shenran Wang; Aidan Pine; Mengzhe Geng
- Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation
Bolei Ma; Yong Cao; Indira Sen; Anna-Carolina Haensch; Frauke Kreuter; Barbara Plank; Daniel Hershcovich
- Respecting Temporal-Causal Consistency: Entity–Event Knowledge Graphs for Retrieval-Augmented Generation
Ze Yu Zhang; Zitao Li; Yaliang Li; Bolin Ding; Bryan Kian Hsiang Low
- Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?
Kai Sun; Yin Huang; Srishti Mehra; Mohammad Kachuee; Xilun Chen; Renjie Tao; Zhaojiang Lin; Andrea Jessee; Nirav Shah; Alex L Betty; Yue Liu; Anuj Kumar; Wen-tau Yih; Xin Luna Dong
- Inferring the Unseen: A Computational Approach to Visual Metonymy
Saptarshi Ghosh; Linfeng Liu; Tianyu Jiang
- A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic
Juan Moreno Gonzalez; Bashar Alhafni; Nizar Habash
- Multimodal Evaluation of Russian-language Architectures
Artem Chervyakov; Ulyana Isaeva; Anton Emelyanov; Artem Safin; Maria Tikhonova; Alexander Kharitonov; Yulia Lyakh; Petr Surovtsev; Denis Shevelev; Vildan Saburov; Vasily Konovalov; Elisei Rykov; Ivan Sviridov; Amina Miftakhova; Ilseyar Alimova; Alexander Panchenko; Alexander Kapitanov; Alena Fenogenova
- Don’t Judge a Book by its Cover: Testing LLMs’ Robustness Under Logical Obfuscation
Abhilekh Borah; Shubhra Ghosh; Kedar Joshi; Aditya Kumar Guru; Kripabandhu Ghosh
- I know you are different! Towards Persona Driven Knowledge-infused Dialogue Assistant
Shifali Agrahari; Moushumi Mahato; Abhisek Tiwari; Javaid Nabi
- Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities
Hongseok Oh; Wonseok Hwang; Kyoung-Woon On
- Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning
Qifan Yu; Zhenyu He; Sijie Li; zhou Xun; Jun Zhang; Jingjing Xu; Di He
- Task-Level Instructions Induction for Audio Question Answering from Few Examples
Po-Chun Chen; Hen-Hsen Huang; Hsin-Hsi Chen
- Layer-wise Swapping for Generalizable Multilingual Safety
Hyunseo Shin; Wonseok Hwang
- Measuring Idiomaticity in Text Embedding Models with $\epsilon$-compositionality
Sondre Wold; Étienne Simon; Erik Velldal; Lilja Øvrelid
- Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
Andrew Zhao; Reshmi Ghosh; Vitor R. Carvalho; Emily Lawton; Keegan Hines; Gao Huang; Jack W. Stokes
- MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
Qian Wang; Ziqi Huang; Ruoxi Jia; Paul Debevec; Ning Yu
- Computational Benchmarks for Egyptian Arabic Child Directed Speech
Salam Khalifa; Abed Qaddoumi; Nizar Habash; Owen Rambow
- K-LegalDeID: A Benchmark Dataset and KLUEBERT-CRF for De-identification in Korean Court Judgments
Wooseok Choi; Hyungbin Kim; Yon Dohn Chung
- Optical Character Recognition for the International Phonetic Alphabet
Shu Okabe; Dejvi Zelo; Alexander Fraser
- Specialization through Collaboration: Understanding Expert Interaction in Mixture-of-Expert Large Language Models
yuanbo tang; Naifan Zhang; Yan Tang; Meixuan Chen; Shuhan Huang; Tingyu Cao; Yang Li
- Compact Language Models with Iterative Text Refinement for Health Dialogue Summarization
Kellen Tan Cheng; Ganesh Ramesh; Nafiul Rashid; Geoffrey Jay Tso; Jilong Kuang
- Mind the Gap: Benchmarking LLM Uncertainty and Calibration with Specialty-Aware Clinical QA and Reasoning-Based Behavioural Features
Alberto Testoni; Iacer Calixto
- Controlling Reading Ease with Gaze-Guided Text Generation
Andreas Säuberli; Darja Jepifanova; Diego Frassinelli; Barbara Plank
- PictureStories: Predicting the Task Adherence of Language Learner Answers to a Picture Story-Based Writing Task
Marie Bexte; Andrew Caines; Diane Nicholls; Paula Buttery; Torsten Zesch
- Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models
Vitalii Hirak; Jaap Jumelet; Arianna Bisazza
- Large Language Models as Oracles for Ontology Alignment
Sviatoslav Lushnei; Dmytro Shumskyi; Severyn Shykula; Ernesto Jiménez-Ruiz; Artur d'Avila Garcez
- Disentangling Knowledge and Reasoning in Biomedical QA Benchmarks
Rahul Thapa; Qingyang Wu; Kevin Wu; Harrison G Zhang; Angela Zhang; Eric Wu; Haotian Ye; James Zou
- Effective QA-Driven Annotation of Predicate–Argument Relations Across Languages
Jonathan Davidov; Aviv Slobodkin; Shmuel Tomi Klein; Reut Tsarfaty; Ido Dagan; Ayal Klein
- Form and Meaning in Intrinsic Multilingual Evaluations
Wessel Poelman; Miryam de Lhoneux
- What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge
Dongzhuoran Zhou; Yuqicheng Zhu; Xiaxia Wang; Hongkuan Zhou; Yuan He; Jiaoyan Chen; Steffen Staab; Evgeny Kharlamov
- Assessing Web Search Credibility and Response Groundedness in Chat Assistants
Ivan Vykopal; Matúš Pikuliak; Simon Ostermann; Marian Simko
- How DDAIR you? Disambiguated Data Augmentation for Intent Recognition
Galo Castillo-López; Gaël de Chalendar; Alexis Lombard; Nasredine Semmar
- When the Model Said ‘No Comment’, We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified
Gautam Siddharth Kashyap; Mark Dras; Usman Naseem
- NeuronMoE: Efficient Cross-Lingual Extension via Neuron-Guided Mixture-of-Experts
Rongzhi Li; Hitomi Yanaka
- From Emotion to Expression: Theoretical Foundations and Resources for Fear Speech
Vigneshwaran Shankaran; Gabriella Lapesa; Claudia Wagner
- AdaptBPE: From General Purpose to Specialized Tokenizers
Vijini Pilana Liyanage; François Yvon
- Measuring Linguistic Competence of LLMs on Indigenous Languages of the Americas
Justin Vasselli; Arturo MP; Frederikus Hudi; Haruki Sakajo; Taro Watanabe
- Reassessing Active Learning Adoption in Contemporary NLP: A Community Survey
Julia Romberg; Christopher Schröder; Julius Gonsior; Katrin Tomanek; Fredrik Olsson
- Beyond “Not Novel Enough”: Enriching Scholarly Critique with LLM-Assisted Feedback
OSAMA MOHAMMED AFZAL; Preslav Nakov; Tom Hope; Iryna Gurevych
- AfriVox: Probing Multilingual and Accent Robustness of Speech LLMs
Busayo Awobade; Mardhiyah Sanni; Tassallah Abdullahi; Chibuzor Okocha; Kelechi Ezema; Devendra Deepak Kayande; Lukman Enegi Ismaila; Tobi Olatunji; Gloria Ashiya Katuka
- PortOldBERT: Portuguese Historical Language Models
Tomas Freitas Osorio; Henrique Lopes Cardoso
- ReMedQA: Are We Done With Medical Multiple-Choice Benchmarks?
Alessio Cocchieri; Luca Ragazzi; Giuseppe Tagliavini; Gianluca Moro
- Can activation steering support language-agnostic reasoning in language models? A study on syllogistic inferences
Gabriele Maraia; Leonardo Ranaldi; Marco Valentino; Fabio Massimo Zanzotto
- Morpheme Matters: Morpheme-Based Subword Tokenization for Korean Language Models
DongHyeok Lee; Jeongyeon Park; Kyungbeen Cho; Jae Sung Lee
- SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space
Viktoriia Zinkovich; Anton Antonov; Andrei Spiridonov; Denis Shepelev; Andrey Moskalenko; Daria Pugacheva; Elena Tutubalina; Andrey Kuznetsov; Vlad Shakhuro
- Knowledge Augmentation Enhances Token Classification for Recipe Understanding
Nuhu Ibrahim; Robert Stevens; Riza Batista-Navarro
- Communication Enables Cooperation in LLM Agents: A Comparison with Curriculum-Based Approaches
Hachem Madmoun; Salem Lahlou
- Argumentation and Judgement Factors: LLM-based Discovery and Application in Insurance Disputes
Basit Ali; Anubhav sinha; Nitin Ramrakhiyani; Sachin Pawar; Girish Keshav Palshikar; Manoj Apte
- ViGoEmotions: A Benchmark Dataset using LLM Annotation For Fine-grained Emotion Detection on Vietnamese Texts
Tran Quang Hung; Pham Tien Nam; Son T. Luu; Kiet Van Nguyen
- PTEB: Towards Robust Text Embedding Evaluation via Stochastic Paraphrasing at Evaluation Time with LLMs
Manuel Frank; Haithem Afli
- DETECT: Determining Ease and Textual Clarity of German Text Simplifications
Maria Korobeynikova; Alessia Battisti; Lukas Fischer; Yingqiang Gao
- MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
Wei-Ling Hsu; Yu-Chien Tang; An-Zi Yen
- Test-Time Scaling of Reasoning Models for Machine Translation
Zihao Li; Shaoxiong Ji; Jörg Tiedemann
- How Good Are LLMs at Processing Tool Outputs?
Kiran Kate; Yara Rizk; Poulami Ghosh; Ashu Gulati; Tathagata Chakraborti; Zidane Wright; Mayank Agarwal
- Tug-of-war between idioms' figurative and literal interpretations in LLMs
Soyoung Oh; Xinting Huang; Mathis Pink; Michael Hahn; Vera Demberg
- Do LLM hallucination detectors suffer from low-resource effect?
Debtanu Datta; Mohan Kishore Chilukuri; Yash Kumar; Saptarshi Ghosh; Muhammad Bilal Zafar
- Mind Your Special Tokens! On the Importance of Dedicated Sequence-End Tokens in Vision-Language Embedding Models
Elio Musacchio; Giovanni Semeraro; Goran GlavaЕЎ
- Coupling Local Context and Global Semantic Prototypes via a Hierarchical Architecture for Rhetorical Roles Labeling
Anas Belfathi; Nicolas Hernandez; Monceaux Laura; Warren Bonnard; Mary Catherine Lavissière; Christine Jacquin; Richard Dufour
- Guided by the Plan: Enhancing Faithful Autoregressive Text-to-Audio Generation with Guided Decoding
Juncheng Wang; Zhe Hu; Chao Xu; Siyue Ren; Yuxiang Feng; Yang Liu; Baigui Sun; Shujun Wang
- Safe-Unsafe Concept Separation Emerges from a Single Direction in Language Models Activation Space
Antonio Serino; Andrea Ermellino; Lorenzo Malandri; Fabio Mercorio
- PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark
Robert Belanec; Branislav Pecher; Ivan Srba; Maria Bielikova
- Decoding the Market's Pulse: Context-Enriched Agentic Retrieval Augmented Generation for Predicting Post-Earnings Price Shocks
Chenhui Li; Weihai Lu
- LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring
May Bashendy; Walid Massoud; Sohaila Eltanbouly; Salam Albatarni; Marwan Sayed; Abrar Abir; Houda Bouamor; Tamer Elsayed
- Live API-Bench: 2500+ Live APIs for Testing Multi-Step Tool Calling
Benjamin Elder; Anupama Murthi; Jungkoo Kang; Ankita Naik; Kinjal Basu; Kiran Kate; Danish Contractor
- MALicious INTent Dataset and Inoculating LLMs for Enhanced Disinformation Detection
Arkadiusz Modzelewski; Witold Sosnowski; Eleni Papadopulos; Elisa Sartori; Tiziano Labruna; Adam Wierzbicki; Giovanni Da San Martino
- When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training
Felicia Körner; Max Müller-Eberstein; Anna Korhonen; Barbara Plank
- Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang; Yanjiang Liu; Weixiang Zhou; Ben He; Yaojie Lu; Hongyu Lin; Jia Zheng; Xianpei Han; Le Sun; Yingfei Sun
- The Reasoning Lingua Franca: A Double-Edged Sword for Multilingual AI
Alan Saji; Raj Dabre; Anoop Kunchukuttan; Ratish Puduppully
- Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems
Kin Kwan Leung; Mouloud Belbahri; Yi Sui; Alex Labach; Xueying Zhang; Stephen Anthony Rose; Jesse C. Cresswell
- Helios: A Foundational Language Model for Smart Energy Knowledge Reasoning and Application
Haoyu Jiang; Boan Qu; Fanjie Zeng; Xiaojie Lin; Wei Zhong
- AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
Georgii Aparin; Tasnima Sadekova; Alexey Rukhovich; Assel Yermekova; Laida Kushnareva; Vadim Popov; Kristian Kuznetsov; Irina Piontkovskaya
- Vision-Language Models Align with Human Neural Representations in Concept Processing
Anna Bavaresco; Marianne de Heer Kloots; Sandro Pezzelle; Raquel Fernández
- FAID: Fine-grained AI-generated Text Detection using Multi-task Auxiliary and Multi-level Contrastive Learning
Minh Ngoc Ta; Dong Cao Van; Duc-Anh Hoang; Minh Le-Anh; Truong Nguyen; My Anh Tran Nguyen; Yuxia Wang; Preslav Nakov; Dinh Viet Sang
- BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
Jaap Jumelet; Abdellah Fourtassi; Akari Haga; Bastian Bunzeck; Bhargav Shandilya; Diana Galvan-Sosa; Faiz Ghifari Haznitrama; Francesca Padovani; Francois Meyer; Hai Hu; Julen Etxaniz; Laurent Prevot; Linyang He; María Grandury; Mila Marcheva; Negar Foroutan; Nikitas Theodoropoulos; Pouya Sadeghi; Siyuan Song; Suchir Salhan; Susana Zhou; Yurii Paniv; Ziyin Zhang; Arianna Bisazza; Alex Warstadt; Leshem Choshen
- Personality Editing for Language Models through Adjusting Self-Referential Queries
Seojin Hwang; Yumin Kim; Byeongjeong Kim; Donghoon Shin; Hwanhee Lee
- How Much Pretraining Does Structured Data Need?
Daniel Fadlon; Kfir Bar
- Finding Culture-Sensitive Neurons in Vision-Language Models
Xiutian Zhao; Rochelle Choenni; Rohit Saxena; Ivan Titov
- Polyglots or Multitudes? Multilingual LLM Answers to Value-laden Multiple-Choice Questions
Léo Labat; Etienne Ollion; François Yvon
- ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links
Serwar Basch; Ilia Kuznetsov; Tom Hope; Iryna Gurevych
- When Flores Bloomz Wrong: An Analysis of Cross-Lingual Contamination in Machine Translation Evaluation
David Tan; Pinzhen Chen; Josef van Genabith; Koel Dutta Chowdhury
- Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue
Sukannya Purkayastha; Nils Dycke; Anne Lauscher; Iryna Gurevych
- HalluZig: Hallucination Detection using Zigzag Persistence
Shreyas N. Samaga; Gilberto Gonzalez Arroyo; Tamal K. Dey
- Mapping the Course for Prompt-based Structured Prediction
Matt Pauk; Maria Leonor Pacheco
- Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models
Runpeng Dai; Run Yang; Fan Zhou; Hongtu Zhu
- Martingale Foresight Sampling: A Principled Approach to Inference-Time LLM Decoding
Huayu Li; ZhengXiao He; siyuan tian; Jinghao Wen; Ao Li
- Is This LLM Library Learning? Evaluation Must Account For Compute and Behaviour
Ian Berlot-Attwell; Tobias Sesterhenn; Frank Rudzicz; Xujie Si
- Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park; Kanchana Ranasinghe; Kumara Kahatapitiya; Wonjeong Ryu; Donghyun Kim; Michael S Ryoo
- A Unified View on Emotion Representation in Large Language Models
Aishwarya Maheswaran; Maunendra Sankar Desarkar
- TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models
Shima Imani; Seungwhan Moon; Lambert Mathias; Lu Zhang; Babak Damavandi
- ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
Mohamed Elaraby; Diane Litman
- AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation
Potsawee Manakul; Haosheng Gan; Michael J Ryan; Ali Sartaz Khan; Warit Sirichotedumrong; Kunat Pipatanakul; William Barr Held; Diyi Yang
- x-SAL: Leading Symbolic Reasoning across Languages via Cross-lingual Symbolic-Aided Language Model
Leonardo Ranaldi; Giulia Pucci
- ToxiPrompt: A Two-Stage Red-Teaming Approach for Balancing Adversarial Prompt Diversity and Response Toxicity
Seungho Lee; Kyumin Lee
- AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages
Kosei Uemura; Miaoran Zhang; David Ifeoluwa Adelani
- SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context
Aishwarya Verma; Laud Ammah; Olivia Nercy Ndlovu Lucas; Andrew Zaldivar; Vinodkumar Prabhakaran; Sunipa Dev
- Better Generalizing to Unseen Concepts: An Evaluation Framework and An LLM-Based Auto-Labeled Pipeline for Biomedical Concept Recognition
Shanshan liu; Noriki Nishida; Fei Cheng; Narumi Tokunaga; Rumana Ferdous Munne; Yuki Yamagata; Kouji Kozaki; Takehito Utsuro; Yuji Matsumoto
- A Representation Sharpening Framework for Zero Shot Dense Retrieval
Dhananjay Ashok; Suraj Nair; Mutasem Al-Darabsah; Choon Hui Teo; Tarun Agarwal; Jonathan May
- Spotlight Your Instructions: Instruction-following with Dynamic Attention Steering
Praveen Venkateswaran; Danish Contractor
- STREAM-ZH: Simplified Topic Retrieval Exploration and Analysis Module for Chinese Language
Hongyi Li; Jianjun Lian; Anton Frederik Thielmann; Andre Python
- FormGym: Doing Paperwork with Agents
Matthew Toles; Isaac Song; RATTANDEEP SINGH; Zhou Yu
- NarraBench: A Comprehensive Framework for Narrative Benchmarking
Sil Hamilton; Matthew Wilkens; Andrew Piper
- From Plausible to Faithful: Optimizing the Faithfulness of LLM Explanations
Yu-Neng Chuang; Guanchu Wang; Chia-Yuan Chang; Ruixiang Tang; Shaochen Zhong; Fan Yang; Andrew Wen; Mengnan Du; Xuanting Cai; Vladimir Braverman; Xia Hu
- MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment
Sagarika Banerjee; Tangatar Madi; Advait Swaminathan; Nguyen Dao Minh Anh; Shivank Garg; Kevin Zhu; Vasu Sharma
- Is Information Density Uniform when Utterances are Grounded on Perception and Discourse?
Matteo Gay; Coleman Haley; Mario Giulianelli; Edoardo Ponti
- KAD: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral
Ayoub Hammal; Pierre Zweigenbaum; Caio Corro
- When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
Abeer Badawi; Elahe Rahimi; Md Tahmid Rahman Laskar; Sheri Grach; Lindsay Bertrand; Lames Danok; Prathiba Dhanesh; Jimmy Huang; Frank Rudzicz; Elham Dolatabadi
- DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
Benno Uthayasooriyar; Antoine LY; Franck Vermet; Caio Corro
- IDEAlign: Comparing Ideas of Large Language Models to Domain Experts
HyunJi Nam; Lucía Langlois; Jim Malamut; Mei Tan; Dorottya Demszky
- Amory: Building Coherent Narrative-Driven Agent Memory through Agentic Reasoning
Yue Zhou; Xiaobo Guo; Belhassen Bayar; Srinivasan H. Sengamedu
- It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models
Cristian Santini; Marieke van Erp; Mehwish Alam
- SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image Generation
Carolin Holtermann; Florian Schneider; Anne Lauscher
- Gender and Politeness Perception: A Novel Approach for Exploring Annotations Disagreement
Ahmad Aljanaideh
- TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models
Carolin Holtermann; Nina Krebs; Anne Lauscher
- ToxiGAN: Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation
Peiran Li; Jan Fillies; Adrian Paschke
- Text Classification Under Class Distribution Shift: A Survey
Adriana Valentina Costache; Silviu-Florin Gheorghe; Eduard Poesina; Paul Irofti; Radu Tudor Ionescu
- Reasoning's Razor: Reasoning Improves Accuracy but Hurts Recall at Critical Operating Points in Safety and Hallucination Detection
Atoosa Chegini; Hamid Kazemi; Garrett Souza; Maria Safi; Yang Song; Samy Bengio; Sinead Williamson; Mehrdad Farajtabar
- Exploring Generative Process Reward Modeling for Semi-Structured Data: A Case Study of Table Question Answering
Lei Tang; Wei Zhou; Mohsen Mesgar
- Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models
Alexey Dontsov; Anton Korznikov; Andrey V. Galichin; Elena Tutubalina
- Learning to Ideate for Machine Learning Engineering Agents
Yunxiang Zhang; Kang Zhou; Zhichao Xu; Kiran Ramnath; Yun Zhou; Sangmin Woo; Haibo Ding; Lin Lee Cheong
- Instructional Agents: Reducing Teaching Faculty Workload through Multi-Agent Instructional Design
Huaiyuan Yao; Wanpeng Xu; Justin Turnau; Nadia Kellam; Hua Wei
- Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling
Weishi Wang; Hengchang Hu; Daniel Dahlmeier
- Validating Automatic Evaluation of Controllable Counterspeech Generation: Rankings Matter More Than Scores
Yi Zheng; Björn Ross; Walid Magdy
- Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking
Rheeya Uppaal; Phu Mon Htut; Min Bai; Nikolaos Pappas; Zheng Qi; Sandesh Swamy
- Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools
Ha Min Son; Huan Ren; Xin Liu; Zhe Zhao
- MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments
Roelien C. Timmer; Necva Bölücü; Stephen Wan
- Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Zhiyu Xue; Reza Abbasi-Asl; Ramtin Pedarsani
- HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations
Yujia Hu; Roy Ka-Wei Lee
- Measuring Mechanistic Independence: Can Bias Be Removed Without Erasing Demographics?
Zhengyang Shan; Aaron Mueller
- A Survey on LLM-based Conversational User Simulation
Bo Ni; Yu Wang; Leyao Wang; Branislav Kveton; Franck Dernoncourt; Yu Xia; Hongjie Chen; Reuben Luera; Samyadeep Basu; Subhojyoti Mukherjee; Puneet Mathur; Nesreen K. Ahmed; Junda Wu; Li Li; Huixin Zhang; Ruiyi Zhang; Tong Yu; Sungchul Kim; Jiuxiang Gu; Zhengzhong Tu; Alexa Siu; Zichao Wang; Seunghyun Yoon; Nedim Lipka; Namyong Park; Zihao Lin; Trung Bui; Yue Zhao; Tyler Derr; Ryan A. Rossi
- Prompt-driven Detection of Offensive Urdu Language using Large Language Models
Iffat Maab; Usman Haider; Junichi Yamagishi
- Zer0-Jack: A memory-efficient gradient-based jailbreaking method for black box Multi-modal Large Language Models
Tiejin Chen; Kaishen Wang; Hua Wei
- RAGPPI: Retrieval-Augmented Generation Benchmark for Protein–Protein Interactions in Drug Discovery
Youngseung Jeon; Ziwen Li; Thomas Li; JiaSyuan Chang; Morteza Ziyadi; Xiang Anthony Chen
- Don't Generate, Classify! Low-Latency Prompt Optimization with Structured Complementary Prompt
Hee-Soo Kim; Junyoung Kim; Jeonghwan Lee; Seong-Jin Park; Kang-Min Kim
- CHROMIC: Chronological Reasoning Across Multi-Panel Comics
Bingxuan Hou; Jiayi Lin; Chenyang Zhang; Dapeng Yin; Shuyue Zhu; Qingqing Hong; Mengna Gao
- GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection
Kai Yao; Penglei Gao; Zhaorui Tan; Kaixin Wu; Danzhao Cheng; Yixin Ji; Zhenghan Song; mingjie zhong
- BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models
Chuyuan Li; Giuseppe Carenini
- Confidence-Calibrated Small-Large Language Model Collaboration for Cost-Efficient Reasoning
Chuang Zhang; Zizhen Zhu; Yihao Wei; Bing Tian; Junyi Liu; Henan Wang; Wang Xavier; Yaxiao Liu
- Chat-Ghosting: Methods for Auto-Completion in Dialog Systems
Anubhab Mandal; Sandeep Mishra; Bishal Santra; Tushar Abhishek; Pawan Goyal; Manish Gupta
- Attribution-Guided Multi-Object Hallucination and Bias Detection in Vision-Language Models
Sirat Samyoun; Yingtai Xiao; Jian Du
- Word Surprisal Correlates with Sentential Contradiction in LLMs
Ning Shi; Bradley Hauer; David Basil; John Zhang; Grzegorz Kondrak
- ARREST: Adversarial Resilient Regulation Enhancing Safety and Truth in Large Language Models
Sharanya Dasgupta; ARKAPRABHA BASU; Sujoy Nath; Swagatam Das
- $\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
Shota Takashiro; Takeshi Kojima; Shohei Taniguchi; Yusuke Iwasawa; Yutaka Matsuo
- Beyond Tokens: Concept-Level Training Objectives for LLMs
Laya Iyer; Pranav Somani; Alice Guo; Dan Jurafsky; Chen Shani
- Re$^2$-DocRED: Revisiting Revisited-DocRED for Joint Entity and Relation Extraction
Chen Kim Heng; Shao Wen Tong; Julian Wong Wei Sheng
- Where Do LLMs Compose Meaning? A Layerwise Analysis of Compositional Robustness
Nura Aljaafari; Danilo Carvalho; Andre Freitas
- BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
Bryan Chen Zhengyu Tan; Weihua Zheng; Zhengyuan Liu; Nancy F. Chen; Hwaran Lee; Kenny Tsu Wei Choo; Roy Ka-Wei Lee
- Document-Level Zero-Shot Relation Extraction with Entity Side Information
Mohan Raj; Lay-Ki Soon; Huey Fang Ong; Bhawani Selvaretnam
- Steering Large Language Models for Machine Translation Personalization
Daniel Scalena; Gabriele Sarti; Arianna Bisazza; Elisabetta Fersini; Malvina Nissim
- Taxation Perspectives from Large Language Models: A Case Study on Additional Tax Penalties
Eunkyung Choi; Young Jin Suh; Siun Lee; Hongseok Oh; Juheon Kang; Won Hur; HUN PARK; Wonseok Hwang
- Beyond Memorization: A Rigorous Evaluation Framework for Medical Knowledge Editing
Shigeng Chen; Linhao Luo; Zhangchi Qiu; Yanan Cao; Carl Yang; Shirui Pan
- Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
Wafaa Mohammed; Vlad Niculae; Chrysoula Zerva
- Cross-lingual and Word-Independent Methods for Quantifying Degree of Grammaticalization
Ryo Nagata; Daichi Mochihashi; Misato Ido; Yusuke Kubota; Naoki Otani; Yoshifumi Kawasaki; Hiroya Takamura
- Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare Entities
Hans Hergen Lehmann; Jae Hee Lee; Steven Schockaert; Stefan Wermter
- Calibrating Beyond English: Language Diversity for Better Quantized Multilingual LLMs
Everlyn Asiko Chimoto; Mostafa Elhoushi; Bruce Bassett
- LaCoMSA: Language-Consistency Multilingual Self-Alignment with Latent Representation Rewarding
Khanh-Tung Tran; Barry O'Sullivan; Hoang D. Nguyen
- Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
Kartik Ravisankar; HyoJung Han; Sarah Wiegreffe; Marine Carpuat
- Recursive numeral systems are highly regular and easy to process
Ponrawee Prasertsom; Andrea Silvi; Jennifer Culbertson; Devdatt Dubhashi; Moa Johansson; Kenny Smith
- Bringing Emerging Architectures to Sequence Labeling in NLP
Ana Ezquerro; Carlos Gómez-Rodríguez; David Vilares
- SEMIROUTER: Sparse-Data Enhanced Routing for Adaptive Multi-LLM System
Zijie Wang; Xinyu Yan; CHE WANG; Zeng Zihao; Lei Xiao; Wei Yang Bryan Lim
- DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Hyeseon Ahn; Shinwoo Park; Suyeon Woo; Yo-Sub Han
- Boundary-Aware LLM Augmentation for Low-Resource Event Argument Extraction
ZHAOYUE SUN; Gabriele Pergola; Yulan He
- CASE – Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement
Gaifan Zhang; Yi Zhou; Danushka Bollegala
- Evaluation and LLM-Guided Learning of ICD Coding Rationales
Mingyang Li; Viktor Schlegel; Tingting Mu; Wuraola Oyewusi; Kai Kang; Goran Nenadic
- Evaluating the Effect of Retrieval Augmentation on Social Biases
Tianhui Zhang; Yi Zhou; Danushka Bollegala
- Persuasion at Play: Understanding Misinformation Dynamics in Demographic-Aware Human-LLM Interactions
Angana Borah; Rada Mihalcea; Veronica Perez-Rosas
- Entropy-Gated Branching for Efficient Test-Time Reasoning
Xianzhi Li; Ethan Callanan; Abdellah Ghassel; Xiaodan Zhu
- Decomposition-Enhanced Training for Post-Hoc Attributions in Language Models
Sriram Balasubramanian; Samyadeep Basu; Koustava Goswami; Ryan A. Rossi; Varun Manjunatha; Roshan Santhosh; Ruiyi Zhang; Soheil Feizi; Nedim Lipka
- INSURE-Dial: A Phase-Aware Conversational Dataset & Benchmark for Compliance Verification and Phase Detection
Shubham Kulkarni; Alexander Lyzhov; Preetam Joshi; Shiva Chaitanya
- Persuasion Tokens for Editing Factual Knowledge in LLMs
Paul Youssef; Jörg Schlötterer; Christin Seifert
- NLP for Social Good: A Survey and Outlook of Challenges, Opportunities and Responsible Deployment
Antonia Karamolegkou; Angana Borah; Eunjung Cho; Sagnik Ray Choudhury; Martina Galletti; Pranav Gupta; Oana Ignat; Priyanka Kargupta; Neema Kotonya; Hemank Lamba; Sun-Joo Lee; Arushi Mangla; Ishani Mondal; Fatima Zahra Moudakir; Deniz Nazarova; Poli Nemkova; Dina Pisarevskaya; Naquee Rizwan; Nazanin Sabri; Keenan Samway; Dominik Stammbach; Anna Steinberg; David Tomás; Steven R Wilson; Jessica H Zhu; Arkaitz Zubiaga; Anders Søgaard; Alexander Fraser; Zhijing Jin; Rada Mihalcea; Joel R. Tetreault; Daryna Dementieva
- From Delegates to Trustees: How Optimizing for Long-Term Interests Shapes Bias and Alignment in LLMs
Suyash Fulay; Jocelyn Zhu; Michiel A. Bakker
- Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection
Ivan Vykopal; Antonia Karamolegkou; Jaroslav Kopčan; Qiwei Peng; Tomáš Javůrek; Michal Gregor; Marian Simko
- FFE-Hallu: Hallucinations in Fixed Figurative Expressions: A Benchmark of Idioms and Proverbs in the Persian Language
Faezeh Hosseini; Mohammadali Yousefzadeh; Yadollah Yaghoobzadeh
- MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval
Delvin Ce Zhang; Suhan Cui; Zhelin Chu; Xianren Zhang; Dongwon Lee
- DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding
Shubham Patle; Sara Ghaboura; Hania Tariq; Mohammad Usman Khan; Omkar Thawakar; Rao Muhammad Anwer; Salman Khan
- ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational Recommenders
Ofer Meshi; Krisztian Balog; Sally Goldman; Avi Caciularu; Guy Tennenholtz; Jihwan Jeong; Amir Globerson; Craig Boutilier
- Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
Yu Wu; Ke Shu; Jonas Fischer; Lidia Pivovarova; David Rosson; Eetu Mäkelä; Mikko Tolonen
- Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
Pedro Henrique Luz de Araujo; Michael A. Hedderich; Ali Modarressi; Hinrich Schuetze; Benjamin Roth
- CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models
Paul Grundmann; Dennis Fast; Jan Frick; Thomas Steffek; Felix Gers; Wolfgang Nejdl; Alexander Löser
- Stop Taking Tokenizers for Granted: They Are Core Design Decisions in Large Language Models
Sawsan Alqahtani; Mir Tafseer Nayeem; Md Tahmid Rahman Laskar; Tasnim Mohiuddin; M Saiful Bari
- DIVINE : Coordinating Multimodal Disentangled Representations for Oro-Facial Neurological Disorder Assessment
Mohd Mujtaba Akhtar; Girish; Muskaan Singh
- Biasless Language Models Learn Unnaturally: How LLMs Fail to Distinguish the Possible from the Impossible
Imry Ziv; Nur Lan; Emmanuel Chemla
- Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
Mohd Mujtaba Akhtar; Girish; Farhan Sheth; Muskaan Singh
- Detecting Non-Membership in LLM Training Data via Rank Correlations
Pranav Shetty; Mirazul Haque; Zhiqiang Ma; Xiaomo Liu
- Taming Object Hallucinations with Verified Atomic Confidence Estimation
Jiarui Liu; Weihao Xuan; Zhijing Jin; Mona T. Diab
- DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning
Nithin Sivakumaran; Justin Chen; David Wan; Yue Zhang; Jaehong Yoon; Elias Stengel-Eskin; Mohit Bansal
- ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers
Saptarshi Sengupta; Zhengyu Zhou; Jun Araki; Xingbo Wang; Bingqing Wang; Suhang Wang; Zhe Feng
- An Empirical Study of Speculative Decoding for Small Language Models
Luca Mainardi; Selcuk Sandikci; Joaquin Vanschoren
- Lost in Formatting: How Output Formats Skew LLM Performance on Information Extraction
Rishi Ravikumar; Nuhu Ibrahim; Riza Batista-Navarro
- Pseudo-Likelihood Training for Reasoning Diffusion Language Models
Shiv Shankar
- RoSE: Round-robin Synthetic Data Evaluation for Selecting LLM Generators without Human Test Sets
Jan Cegin; Branislav Pecher; Ivan Srba; Jakub Simko
- RotBench: Evaluating Multi-modal Large Language Models on Identifying Image Rotation
Tianyi Niu; Jaemin Cho; Elias Stengel-Eskin; Mohit Bansal
- Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
Alireza Dehghanpour Farashah; Aditi Khandelwal; Marylou Fauchard; Zhuan Shi; Negar Rostamzadeh; Golnoosh Farnadi
- Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs
Yuxuan Jiang; Francis Ferraro
- Neural Breadcrumbs: Membership Inference Attacks on LLMs Through Hidden State and Attention Pattern Analysis
Disha Makhija; Manoj Ghuhan Arivazhagan; Vinayshekhar Bannihatti Kumar; Rashmi Gangadharaiah
- Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data
Paul Quinlan; Qingguo Li; Xiaodan Zhu
- Language Family Matters: Evaluating SpeechLLMs Across Linguistic Boundaries
Yuchen Zhang; Ravi Shekhar; Haralambos Mouratidis
- Beyond Names: How Grammatical Gender Markers Bias LLM-based Educational Recommendations
Luca Benedetto; Antonia Donvito; Alberto Lucchetti; Andrea Cappelli; Paula Buttery
- ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images
Mathieu Sibue; Andrés Muñoz Garza; Samuel Mensah; Pranav Shetty; Zhiqiang Ma; Xiaomo Liu; Manuela Veloso
- What’s Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning
Zhaotian Weng; Haoxuan Li; Xin Eric Wang; Kuan-Hao Huang; Jieyu Zhao
- When Benchmarks Age: Temporal Misalignment through Large Language Model Factuality Evaluation
Xunyi Jiang; Dingyi Chang; Julian McAuley; Xin Xu
- On the Additive Compositionality of Task Vectors in Vision–Language Models
Yuting SHI; Houjing WEI; Naoya Inoue
- KidsArtBench: Multi-Dimensional Children’s Art Evaluation with Attribute-Aware MLLMs
Mingrui Ye; Chanjin Zheng; Zengyi Yu; Chenyu Xiang; Zhixue Zhao; Zheng Yuan; Helen Yannakoudakis
- Steering Safely or Off a Cliff? Rethinking Specificity and Robustness in Inference-Time Interventions
Navita Goyal; Hal Daumé III
- Tracing Multilingual Knowledge Acquisition Dynamics in Domain Adaptation: A Case Study of English-Japanese Biomedical Adaptation
Xin Zhao; Naoki Yoshinaga; Yuma Tsuta; Akiko Aizawa
- Contextual morphologically-guided tokenization for Latin encoder models
Marisa Hudspeth; Patrick J. Burns; Brendan O'Connor
- Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning
Yang ZHANG; Amr Mohamed; Hadi Abdine; Guokan Shang; Michalis Vazirgiannis
- ObjChangeVR: Object State Change Reasoning from Continuous Egocentric Views in VR Environments
Shiyi Ding; SHAOEN WU; Ying Chen
- Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge
Yiyang Feng; Zeming Chen; Haotian Wu; Jiawei Zhou; Antoine Bosselut
- Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs
Arya Labroo; Ivaxi Sheth; Vyas Raina; Amaani Ahmed; Mario Fritz
- Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
Jingyi Chen; Zhimeng Guo; Jiyun Chun; Pichao WANG; Andrew Perrault; Micha Elsner
- CSPB: Conversational Speech Processing Benchmark for Self-supervised Speech Models
Zili Huang; Matthew Maciejewski; Leibny Paola Garcia Perera; Shinji Watanabe; Sanjeev Khudanpur
- Multi-Token Completion for Text Anonymization
Pulkit Madaan; Krithika Ramesh; Lisa Bauer; Charith Peris; Anjalie Field
- MERLIN: Multi-Stage Curriculum Alignment for Multilingual Encoder-LLM Integration in Cross-Lingual Reasoning
Kosei Uemura; David Guzmán; Quang Phuoc Nguyen; Jesujoba Oluwadara Alabi; En-Shiun Annie Lee; David Ifeoluwa Adelani
- Now You Hear Me: Audio Narrative Attacks Against Large Audio–Language Models
Ye Yu; Haibo Jin; Yaoning Yu; Jun Zhuang; Haohan Wang
- Evaluating Adversarial Robustness of Concept Representations in Sparse Autoencoders
Aaron J. Li; Suraj Srinivas; Usha Bhalla; Himabindu Lakkaraju
- Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives?
Karin De Langis; Püren Öncel; Ryan Peters; Andrew Elfenbein; Laura Kristen Allen; Andreas Schramm; Dongyeop Kang
- Strong Memory, Weak Control: An Empirical Study of Executive Functioning in LLMs
Karin De Langis; Jong Inn Park; Khanh Chi Le; Andreas Schramm; Andrew Elfenbein; Michael C. Mensink; Dongyeop Kang
- How Do Language Models Acquire Character-Level Information?
Soma Sato; Ryohei Sasano
- Analysing the role of lexical and temporal information in turn-taking through predictability
Sean Leishman; Sarenne Wallbridge; Peter Bell
- Beyond Length: Context-Aware Expansion and Independence as Developmentally Sensitive Evaluation in Child Utterances
Jiyun Chun; Eric Fosler-Lussier; Michael White; Andrew Perrault
- Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese
Zilong Li; Jie Cao
- Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana; Pittawat Taveekitworachai; Warit Sirichotedumrong; Potsawee Manakul; Kunat Pipatanakul
- Common Sense or Ableism? Rethinking Commonsense Reasoning Through the Lens of Disability
Karina H Halevy; Kimi Wenzel; Seyun Kim; Kyle Dean Bauer; Bruno Neira; Mona T. Diab; Maarten Sap
- Detecting Hallucinations in Vision-Language Models without Generating a Single Token
Sai Akhil Kogilathota; Sripadha Vallabha E G; Luzhe Sun; Jiawei Zhou
- Nanda Family: Open-Weights Generative Large Language Models for Hindi
Aaryamonvikram Singh; Debopriyo Banerjee; Dhruv Sahnan; Monojit Choudhury; Shivam Chauhan; Rocktim Jyoti Das; Xudong Han; Haonan Li; Alok Anil Jadhav; Utkarsh Agarwal; Mukund Choudhary; Fajri Koto; Junaid Hamid Bhat; Awantika Shukla; Samujjwal Ghosh; Samta Kamboj; Onkar Pandit; Lalit Pradhan; Rahul Pal; Sunil Kumar Sahu; Parvez Mullah; Ali El Filali; Zainul Abedien Ahmed Quraishi; Neha Sengupta; Gokulakrishnan Ramakrishnan; Rituraj Joshi; Gurpreet Gosal; Avraham Sheinin; Natalia Vassilieva; Preslav Nakov
- Wugnectives: Novel Entity Inferences of Language Models from Discourse Connectives
Daniel Brubaker; William Sheffield; Junyi Jessy Li; Kanishka Misra
- Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval over haystacks
Amey Hengle; Prasoon Bajpai; Soham Dan; Tanmoy Chakraborty
- Beyond Accuracy: Benchmarking Abstention and Uncertainty in Large Language Models for Medical Question Answering
Sravanthi Machcha; Sushrita Yerra; Sahil Gupta; Aishwarya Sahoo; Sharmin Sultana; hong yu; Zonghai Yao
- MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework
Zonghai Yao; Zihao Zhang; Chaolong Tang; Xingyu Bian; Youxia Zhao; Zhichao Yang; Junda Wang; Huixue Zhou; Won Seok Jang; Feiyun Ouyang; hong yu
- Machine translation Evaluation Eng-Thai MQM Ranking dataset
Phichet Phuangrot; Natdanai Trintawat; Kanawat Vilasri; Yanapat Patcharawiwatpong; Pachara Boonsarngsuk; Nat Pavasant; Ekapol Chuangsuwanich
- Continual-learning for Modelling Low-Resource Languages from Large Language Models
Santosh Srinath K; Mudit Somani; Varun Reddy Padala; Prajna Upadhyay; Abhijit Das
- Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance
jongwon ryu; Joonhyung Park; Jaeho Han; Yeong-Seok Kim; Hye-Rin Kim; Sunjae Yoon; Junyeong Kim
- Evaluation of Deontic Conditional Reasoning in Large Language Models: The Case of Wason's Selection Task
Hirohiko Abe; Kentaro Ozeki; Risako Ando; Takanobu Morishita; Koji Mineshima; Mitsuhiro Okada
- LLMs as Cultural Archives: Cultural Commonsense Knowledge Graph Extraction
Junior Cedric Tonga; Chen Cecilia Liu; Iryna Gurevych; Fajri Koto
- Nahw: A Comprehensive Benchmark of Arabic Grammar Understanding, Error Detection, Correction, and Explanation
Hamdy Mubarak; Majd Hawasly; Abubakr Mohamed
- Confidence Leaps in LLM Reasoning: Early Stopping and Cross-Model Transfer
Pavel Tikhonov; Ivan Oseledets; Elena Tutubalina
- Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations
Li-Chun Lu; Miri Liu; Pin Chun Lu; Yufei Tian; Shao-Hua Sun; Nanyun Peng
- TReX: Tokenizer Regression for Optimal Data Mixture
Inho Won; HanGyeol Yoo; Minkyung Cho; Jungyeul Park; Hoyun Song; KyungTae Lim
- CONGRAD: Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li; Thuy-Trang Vu; Christian Herold; Amirhossein Tebbifakhr; Shahram Khadivi; Gholamreza Haffari
- Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs
Pranav Bhandari; Nicolas Fay; Sanjeevan Selvaganapathy; Amitava Datta; Usman Naseem; Mehwish Nasim
- Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks
Sergey Pankratov; Dan Alistarh
- KG-CRAFT: Knowledge Graph-based Contrastive Reasoning with LLMs for Enhancing Automated Fact-checking
Vítor Lourenço; Aline Paes; Tillman Weyde; Audrey Depeige; Mohnish Dubey
- SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature
Hang Ding; Yilun Zhao; Tiansheng Hu; Manasi Patwardhan; Arman Cohan
- Unintended Token-Level Memorization in Language Model Fine-Tuning
Marton Szep; Jorge Marin Ruiz; Georgios Kaissis; Paulina Seidl; Rüdiger von Eisenhart-Rothe; Florian Hinterwimmer; Daniel Rueckert
- Exploring Fine-Tuning for In-Context Retrieval and Efficient KV-Caching in Long-Context Language Models
Francesco Maria Molfese; Momchil Hardalov; Rexhina Blloshmi; Bill Byrne; AdriГ de Gispert
- The Pluralistic Moral Gap: Understanding Moral Judgment and Value Differences between Humans and Large Language Models
Giuseppe Russo; Debora Nozza; Paul Röttger; Dirk Hovy
- CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning
Van-Quang Nguyen; Takayuki Okatani
- Explaining Generalization of AI-Generated Text Detectors Through Linguistic Analysis
Yuxi Xia; Kinga StaЕ„czak; Benjamin Roth
- Elections go bananas: a first large-scale multilingual study of pluralia tantum using LLMs
Elena Spaziani; Kamyar Zeinalipour; Pierluigi Cassotti; Nina Tahmasebi
- Post-ASR Correction in Hindi: Comparing Language Models and Large Language Models in Low-Resource Scenarios
Rishabh Kumar; Amrith Krishna; Ganesh Ramakrishnan; Preethi Jyothi
- CacheNotes: Task-Aware Key-Value Cache Compression for Reasoning-Intensive Knowledge Tasks
Giulio Corallo; Orion Weller; Fabio Petroni; Paolo Papotti
- Beyond Blind Following: Evaluating Robustness of LLM Agents under Imperfect Guidance
Yao Fu; Ran Qiu; Xinhe Wang; Jacob Sansom; Sathvika Ayyappa Prabhu; Huijie Tang; Jaekyeom Kim; Sungryull Sohn; Honglak Lee
- How Do LLMs Generate Contrastive Sentiments? A Mechanistic Perspective
Van Bach Nguyen; Christin Seifert; Jörg Schlötterer
- Continual Neural Topic Model
Charu Karakkaparambil James; Waleed Mustafa; Marcio Monteiro; Marius Kloft; Sophie Fellenz
- MAQuA: Multi-outcome Adaptive Question-Asking for Mental Health using Item Response Theory
Vasudha Varadarajan; Hui Xu; Rebecca Astrid Böhme; Mariam Marlen Mirström; Sverker Sikström; H. Schwartz
- Principled Self-Correction in Discrete Diffusion: A UCB-Guided Framework for Text Generation
Masaki Asada; Makoto Miwa
- ConLID: Supervised Contrastive Learning for Low-Resource Language Identification
Negar Foroutan; Jakhongir Saydaliev; Grace Kim; Antoine Bosselut
- CHiRPE: A Step Towards Real-World Clinical NLP with Clinician-Oriented Model Explanations
Stephanie Fong; Guilherme C Oliveira; Xiangyu Zhao; Yiwen Jiang; Zimu Wang; Jiahe Liu; Beau-Luke Colton; Scott W. Woods; Martha Shenton; Barnaby Nelson; Zongyuan Ge; Dominic Dwyer
- Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection
Yanran Chen; Lynn Greschner; Roman Klinger; Michael Klenk; Steffen Eger
- Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization
Mizanur Rahman; Mohammed Saidul Islam; Md Tahmid Rahman Laskar; Shafiq Joty; Enamul Hoque
- Offline Preference Optimization via Maximum Marginal Likelihood Estimation
Saeed Najafi; Alona Fyshe
- The Relevance of Value Systems for Offensive Language Detection
Michael Wiegand; Elisabeth Eder; Josef Ruppenhofer
- Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Hyunji Lee; Seunghyun Yoon; Yunjae Won; Hanseok Oh; Geewook Kim; Trung Bui; Franck Dernoncourt; Elias Stengel-Eskin; Mohit Bansal; Minjoon Seo
- RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
Aashiq Muhamed; Leonardo F. R. Ribeiro; Markus Dreyer; Virginia Smith; Mona T. Diab
- Query Decomposition for RAG: Balancing Exploration-Exploitation
Roxana Petcu; Kenton Murray; Daniel Khashabi; Evangelos Kanoulas; Maarten de Rijke; Dawn Lawrie; Kevin Duh
- Do Images Speak Louder than Words? Investigating the Effect of Textual Misinformation in VLMs
Chi Zhang; Wenxuan Ding; Jiale Liu; Mingrui Wu; Qingyun Wu; Ray Mooney
- Sycophancy Hides Linearly in the Attention Heads
Rifo Ahmad Genadi; Munachiso Samuel Nwadike; Nurdaulet Mukhituly; Tatsuya Hiraoka; Hilal AlQuabeh; Kentaro Inui
- AICD Bench: A Challenging Benchmark for AI-Generated Code Detection
Daniil Orel; Dilshod Azizov; Indraneil Paul; Yuxia Wang; Iryna Gurevych; Preslav Nakov
- Safeguarding Language Models via Self-Destruct Trapdoor
Shahar Katz; Bar Alon; Ariel Shaulov; Lior Wolf; Mahmood Sharif
- Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity
Prakhar Ganesh; Reza Shokri; Golnoosh Farnadi
- Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research
Bojan Batalo; Erica K. Shimomoto; Dipesh Satav; Neil Millar
- H3Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Selim Furkan Tekin; Fatih Ilhan; Sihao Hu; Tiansheng Huang; Yichang Xu; Zachary Yahn; Ling Liu
- Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Yeganeh Kordi; Nihal V. Nayak; Max Zuo; Ilana Nguyen; Stephen Bach
- BLUR: A Bi-Level Optimization Approach for LLM Unlearning
Hadi Reisizadeh; Jinghan Jia; Zhiqi Bu; Bhanukiran Vinzamuri; Anil Ramakrishna; Kai-Wei Chang; Volkan Cevher; Sijia Liu; Mingyi Hong
- DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding
Moulik Choraria; Xinbo Wu; Akhil Bhimaraju; Nitesh Sekhar; Yue Wu; Xu Zhang; Prateek Singhal; Lav R. Varshney
- Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Mirac Suzgun; Mert Yuksekgonul; Federico Bianchi; Dan Jurafsky; James Zou
- Evidential Semantic Entropy for LLM Uncertainty Quantification
Lucie Kunitomo-Jacquin; Edison Marrese-Taylor; Ken Fukuda; Masahiro Hamasaki
- LLMs Know More About Numbers than They Can Say
Fengting Yuchi; Li Du; Jason Eisner
- SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases
Laya Iyer; Angelina Wang; Sanmi Koyejo
- Incentivizing Strong Reasoning from Weak Supervision
Yige Yuan; Teng Xiao; Shuchang Tao; Xue Wang; Jinyang Gao; Bolin Ding; Bingbing Xu
- DivMerge: A divergence-based model merging method for multi-tasking
Brahim Touayouch; Loïc Fosse; Géraldine Damnati; Gwénolé Lecorvé
- A Reinforcement Learning Framework for Robust and Secure LLM Watermarking
Li An; Yujian Liu; Yepeng Liu; Yuheng Bu; Yang Zhang; Shiyu Chang
- Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents
Sameer Komoravolu; Khalil Mrini
- User-Centric Evidence Ranking for Attribution and Fact Verification
Guy Alt; Eran Hirsch; Serwar Basch; Ido Dagan; Oren Glickman
- Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language
Mena Attia; Aashiq Muhamed; Mai Alkhamissi; Thamar Solorio; Mona T. Diab
- VietMix: A Naturally-Occurring Parallel Corpus and Augmentation Framework for Vietnamese-English Code-Mixed Machine Translation
Hieu Tran; Phuong-Anh Nguyen-Le; Huy Nghiem; Quang-Nhan Nguyen; Wei Ai; Marine Carpuat
- Do You See Me?: A Diagnostic Benchmark for Evaluating Visual Perception in Multimodal Language Models
Aditya Sanjiv Kanade; Tanuja Ganu
- An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents
Farnoosh Hashemi; Michael Macy
- Detecting Subtle Biases: An Ethical Lens on Underexplored Areas in AI Language Models Biases
Shayan Bali; Farhan Farsi; Mohammad Hosseini; Adel Khorramrouz; Ehsaneddin Asgari
- HarfoSokhan: A Comprehensive Parallel Dataset for Transitions between Persian Colloquial and Formal Variations
Hamid Jahad Sarvestani; Vida Ramezanian; Saee Saadat; Neda Taghizadeh Serajeh; Maryam Sadat Razavi Taheri; Shohreh Kasaei; MohammadAmin Fazli; Ehsaneddin Asgari
- JointCal: Efficient and Effective Domain-adapted Compression
Miles Williams; George Chrysostomou; Vitor Amancio Jeronymo; Nikolaos Aletras
- GRAVITY: A Framework for Personalized Text Generation via Profile-Grounded Synthetic Preferences
Priyanka Dey; Daniele Rosa; Wenqing Zheng; Daniel Barcklow; Jieyu Zhao; Emilio Ferrara
- On the Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions
Felix Stollenwerk
- Multimodal Conversation Structure Understanding
Kent K. Chang; Mackenzie Hanh Cramer; Anna Ho; Ti Ti Nguyen; Yilin Yuan; David Bamman
- A Review of Incorporating Psychological Theories in LLMs
Zizhou Liu; Ziwei Gong; Lin Ai; Zheng Hui; Run Chen; Colin Wayne Leach; Michelle R. Greene; Julia Hirschberg
- How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities
Aly M. Kassem; Bernhard Schölkopf; Zhijing Jin
- NG-Router: Graph-Supervised Multi-Agent Collaboration for Nutrition Question Answering
Kaiwen Shi; Zheyuan Zhang; Zhengqing Yuan; Keerthiram Murugesan; Vincent Galassi; Chuxu Zhang; Yanfang Ye
- Verification-Aware Planning for Multi-Agent Systems
Tianyang Xu; Dan Zhang; Kushan Mitra; Estevam Hruschka
- Zero-Shot Open-Schema Entity Structure Discovery
Xueqiang Xu; Jinfeng Xiao; James Barry; Mohab Elkaref; Jiaru Zou; Pengcheng Jiang; Yunyi Zhang; Maxwell J Giammona; Geeth De Mel; Jiawei Han
- Beyond Semantics: How Temporal Biases Shapes Retrieval in Transformer and State-Space Models
Anooshka Bajaj; Deven Mahesh Mistry; Sahaj Singh Maini; Yash Aggarwal; Zoran Tiganj
- Diagnosing Vision Language Models' Perception by Leveraging Human Methods for Color Vision Deficiencies
Kazuki Hayashi; Shintaro Ozaki; Yusuke Sakai; Hidetaka Kamigaito; Taro Watanabe
- Tokenizer-Aware Cross-Lingual Adaptation of Decoder-Only LLMs through Embedding Relearning and Swapping
Fan Jiang; Honglin Yu; Grace Y Chung; Trevor Cohn
- Active Generalized Category Discovery with Diverse LLM Feedback
Henry Peng Zou; Siffi Singh; Yi Nian; Jianfeng He; Jason Cai; Saab Mansour; Hang Su
- RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Chenyang Zhu; Spencer Hong; Jingyu Wu; Kushal Chawla; Yuhui Tang; Youbing Yin; Nathan Wolfe; Erin Babinsky; Daben Liu
- Jailbreaks as Inference-Time Alignment: A Framework for Understanding Safety Failures in LLMs
James Beetham; Souradip Chakraborty; Mengdi Wang; Furong Huang; Amrit Singh Bedi; Mubarak Shah
- Over-Searching in Retrieval-Augmented Large Language Models
Roy Xie; Deepak Gopinath; David Qiu; Dong Lin; Haitian Sun; Saloni Potdar; Bhuwan Dhingra
- LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing
Daniel Fein; Sebastian Russo; Violet Xiang; Kabir Jolly; Rafael Rafailov; Nick Haber
- H-Mem: Hybrid Multi-Dimensional Memory Management for Long-Context Conversational Agents
Zihe Ye; Jingyuan Huang; Weixin Chen; Yongfeng Zhang
- ``Yuki Gets Sushi, David Gets Steak?'': Uncovering Gender and Racial Biases in LLM-Based Meal Recommendations
Xuefeng Wei; Xuan Zhou; Yusuke Sakai; Taro Watanabe
- Happiness is Sharing a Vocabulary: A Study of Transliteration Methods
Haeji Jung; Jinju Kim; Kyungjin Kim; Youjeong Roh; David R. Mortensen
- SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning
Renxi Wang; Honglin Mu; Liqun Ma; Lizhi Lin; Yunlong Feng; Timothy Baldwin; Xudong Han; Haonan Li
- Look Before You Leap: A Lookahead Reasoning Quality Gate for Speculative Decoding
Hiroaki Kingetsu; Kaoru Yokoo; Kenji Fukumizu; Manohar Kaul
- FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models
Masoomali Fatehkia; Enes Altinisik; Husrev Taha Sencar
- BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
Tsung-Min Pai; Jui-I Wang; Li-Chun Lu; Shao-Hua Sun; Hung-yi Lee; Kai-Wei Chang
- Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story
Pedashenko Vladislav; Laida Kushnareva; Yana Khassan Nibal; Eduard Tulchinskii; Kristian Kuznetsov; Vladislav Zharchinskii; Yury Maximov; Irina Piontkovskaya
- Efficient Uncertainty Quantification of Language Models through Token Clustering
Qi Cao; Andrew Gambardella; Takeshi Kojima; Yutaka Matsuo; Yusuke Iwasawa
- Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models
Zongyu Wu; Minhua Lin; Zhiwei Zhang; Fali Wang; Xianren Zhang; Xiang Zhang; Suhang Wang
- Becoming Experienced Judges: Selective Test-Time Learning for Evaluators
Seungyeon Jwa; Daechul Ahn; Reokyoung Kim; Dongyeop Kang; Jonghyun Choi
- Language Lives in Sparse Dimensions: Toward Interpretable and Efficient Multilingual Control for Large Language Models
Chengzhi Zhong; Fei Cheng; Qianying Liu; Yugo Murawaki; Chenhui Chu; Sadao Kurohashi
- Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models
Atharvan Dogra; Soumya Suvra Ghosal; Ameet Deshpande; Ashwin Kalyan; Dinesh Manocha
- Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in LLMs
Yujia Zheng; Tianhao Li; Haotian Huang; Tianyu Zeng; Jingyu Lu; Chuangxin Chu; Yuekai Huang; Ziyou Jiang; Qian Xiong; Yuyao Ge; Mingyang Li
- A Regex Minimization Benchmark: A PSPACE-Complete Challenge for Language Models
Hyundong Jin; Joonghyuk Hahn; Yo-Sub Han
- Teaching Small Language Models to Learn Logic through Meta-Learning
Leonardo Bertolazzi; Manuel Vargas Guzmán; Raffaella Bernardi; Maciej Malicki; Jakub Szymanik
- COMPACT: Building Compliance Paralegals via Clause Graph Reasoning over Contracts
Ayush Singh; Dishank Aggarwal; PRANAV BHAGAT; Ainulla Khan; Sameer Malik; Amar Prakash Azad
- Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets
Omar Momen; Emilie Sitter; Berenike Herrmann; Sina Zarrieß
- Repairing Regex Vulnerabilities via Localization-Guided Instructions
Sicheol Sung; Joonghyuk Hahn; Yo-Sub Han
- Statistical Foundations of DIME: Risk Estimation for Practical Index Selection
Giulio D'Erasmo; Cesare Campagnano; Antonio Mallia; Pierpaolo Brutti; Nicola Tonellotto; Fabrizio Silvestri
- Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality
Jana Jung; Marlene Lutz; Indira Sen; Markus Strohmaier
- ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations
Yindong Wang; Martin Preiß; Margarita Bugueño; Jan Vincent Hoffbauer; Abdullatif Ghajar; Tolga Buz; Gerard de Melo
- Cosine Similarity as Logits?: A Scalable Knowledge Probe Using Embedding Vectors from Generative Language Models
Tomoyuki Jinno; Kazuki Hayashi; Yusuke Sakai; Hidetaka Kamigaito; Taro Watanabe
- Generating Multi-Aspect Queries for Conversational Search
Zahra Abbasiantaeb; Simon Lupart; Mohammad Aliannejadi
- Navigating the Infinite Dynamic Web Space: Effective In-Context Exploration via Cognitive Multi-Agent Collaboration
Guozhao Mo; Yanjiang Liu; Yafei Shi; Jiawei Chen; Yang Li; Yaojie Lu; Hongyu Lin; Ben He; Le Sun; Bo Zheng; Xianpei Han
- TimeMachine-bench: A Benchmark on Evaluating Model's Capability on Repository-level Migration Tasks
Ryo Fujii; Makoto Morishita; Kazuki Yano; Jun Suzuki
- Tandem Training for Language Models
Robert West; Ashton Anderson; Ece Kamar; Eric Horvitz
- Can MLLM Find Their Way in a City? Exploring Emergent Navigation from Web-Scale Knowledge
Dwip Dalal; Utkarsh Mishra; Narendra Ahuja; Nebojsa Jojic
- Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
Alla Chepurova; Aydar Bulatov; Mikhail Burtsev; Yuri Kuratov
- CAIRE: Cultural Attribution of Images with Retrieval
Arnav Yayavaram; Siddharth Yayavaram; Simran Khanuja; Michael Saxon; Graham Neubig
- What Does Infect Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs
Xinlan Yan; Di Wu; Yibin Lei; Christof Monz; Iacer Calixto
- Redefining Retrieval Evaluation in the Era of LLMs
Giovanni Trappolini; Florin Cuconasu; Simone Filice; Yoelle Maarek; Fabrizio Silvestri
- Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation
Abir HARRASSE; Chaithanya Bandi; Hari Bandi