Accepted Findings Papers

  • Unveiling the Deficiencies of Pre-trained Text-and-Layout Models in Real-world Visually-rich Document Information Extraction Chong Zhang; Yixi Zhao; Yulu Xie; Chenshu Yuan; Yi Tu; Ya Guo; Mingxu Chai; Ziyu Shen; Yue Zhang; Qi Zhang
  • Entity-aware Cross-lingual Claim Detection for Automated Fact-checking Rrubaa Panchendrarajan; Arkaitz Zubiaga
  • WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning Yuchen Zhuang; Di Jin; Jiaao Chen; Wenqi Shi; Hanrui Wang; Chao Zhang
  • Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs Sewon Kim; Jiwon Kim; SeungWoo Shin; Hyejin Chung; Daeun Moon; Yejin Kwon; Hyunsoo Yoon
  • Aligning Large Vision-Language Models via Joint Multimodal Preference Optimization Jiwon Kim; Hyunsoo Yoon
  • Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefixing Improves Theory of Mind in Large Language Models Kazutoshi Shinoda; Nobukatsu Hojo; Kyosuke Nishida; Yoshihiro Yamazaki; Keita Suzuki; Hiroaki Sugiyama; Kuniko Saito
  • Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey Seunghyuk Cho; Zhenyue Qin; Yang Liu; Youngbin Choi; Seungbeom Lee; Dongwoo Kim
  • Examining the Utility of Self-disclosure Types for Modeling Annotators of Social Norms Kieran Henderson; Kian Omoomi; Vasudha Varadarajan; Allison Lahnala; Charles Welch
  • Position Paper: How Should We Responsibly Adopt LLMs in the Peer Review Process? Juhwan Choi; JungMin Yun; Changhun Kim; YoungBin Kim
  • Rad-Flamingo: A Multimodal Prompt driven Radiology Report Generation Framework with Patient-Centric Explanations Md. Tousin Akhter; Devansh Lalwani; Kshitij Sharad Jadhav; Pushpak Bhattacharyya
  • I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search Zujie Liang; Feng Wei; Wujiang Xu; Yuxi qian; Lin Chen; Xinhui Wu
  • ThinkNote: Enhancing Knowledge Integration and Utilization of Large Language Models via Constructivist Cognition Modeling Zhipeng Xu; Zhenghao Liu; Yukun Yan; Shuo Wang; Shi Yu; Zheni Zeng; Chaojun Xiao; Zhiyuan Liu; Ge Yu; Chenyan Xiong
  • Mitigating Copy Bias in In-Context Learning through Neuron Pruning Ameen Ali Ali; Lior Wolf; Ivan Titov
  • How to Make LMs Strong Node Classifiers? Zhe Xu; Kaveh Hassani; Si Zhang; Hanqing Zeng; Michihiro Yasunaga; Limei Wang; Dongqi Fu; Ning Yao; Bo Long; Hanghang Tong
  • Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives Yajiao LIU; Congliang Chen; Junchi YANG; Ruoyu Sun
  • The Mediomatix Corpus: Parallel Data for Romansh Idioms via Comparable Schoolbooks Zachary William Hopton; Jannis Vamvas; Andrin Büchler; Anna Rutkiewicz; Rico Cathomas; Rico Sennrich
  • Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models Wei Zhao; Zhe Li; Yige Li; Jun Sun
  • JEEM: Vision-Language Understanding in Four Arabic Dialects Karima Kadaoui; Hanin atwany; Hamdan Al-Ali; Abdelrahman Mohamed; Ali Mekky; Sergei Tilga; Natalia Fedorova; Ekaterina Artemova; Hanan Aldarmaki; Yova Kementchedjhieva
  • Detecting Primary Progressive Aphasia (PPA) from Text: A Benchmarking Study Ghofrane Merhbene; Fabian Lecron; Philippe Fortemps; Bradford C. Dickerson; Mascha Kurpicz-Briki; Neguine Rezaii
  • LayerNorm vs RMSNorm: Geometric Perspective and a Case Against Mean Subtraction Akshat Gupta; Atahan Ozdemir; Caoqinwei Gong; Gopala Anumanchipalli
  • Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs Honghao Liu; Xuhui Jiang; Chengjin Xu; Cehao Yang; Yiran Cheng; Lionel Ni; Jian Guo
  • Do Diacritics Matter? Evaluating the Impact of Arabic Diacritics on Tokenization and LLM Benchmarks Go Inoue; Bashar Alhafni; Nizar Habash; Timothy Baldwin
  • Pelican Soup Framework: A Theoretical Framework for Language Model Capabilities Ting-Rui Chiang; Dani Yogatama
  • I Know, but I Don't Know! How Persona Conflict Undermines Instruction Adherence in Large Language Models Seonmin Koo; Jinsung Kim; Heuiseok Lim
  • Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models Maximilian Kreutner; Marlene Lutz; Markus Strohmaier
  • Exploring Iterative Controllable Summarization with Large Language Models Sangwon Ryu; Heejin Do; Daehui Kim; Hwanjo Yu; Dongwoo Kim; Yunsu Kim; Gary Lee; Jungseul Ok
  • The Price of Thought: A Multilingual Analysis of Reasoning, Performance, and Cost of Negotiation in Large Language Models Sherzod Hakimov; Roland Bernard; Tim Leiber; Karl Osswald; Kristina Richert; Ruilin Yang; Raffaella Bernardi; David Schlangen
  • ART: Adaptive Reasoning Trees for Explainable Claim Verification Sahil Wadhwa; Himanshu Kumar; Guanqun Yang; Abbaas Alif Mohamed Nishar; Pranab Mohanty; Swapnil Shinde; Yue Wu
  • VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy Yu Cui; Sicheng Pan; Yifei Liu; Haibin Zhang; Cong Zuo
  • VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought Eunsoo Lee; Jeongwoo Lee; Minki Hong; Jangho Choi; Jihie Kim
  • $\texttt{KNN-SSD}$: Enabling Dynamic Self-Speculative Decoding via Nearest Neighbor Layer Set Optimization Mingbo Song; Heming Xia; Jun Zhang; Chak Tou Leong; Qiancheng Xu; Wenjie Li; Sujian Li
  • Seeing Between the Verbs: Resolving Ambiguities with Multimodal Sense Clustering Louie Hong Yao; Nicholas Jarvis; Tianyu Jiang
  • HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition Gio Paik; Yongbeom Kim; Soungmin Lee; Sangmin Ahn; Chan Woo Kim
  • Complexity-aware fine-tuning Andrey Goncharov; Daniil Vyazhev; Petr Sychev; Edvard Khalafyan; Alexey Zaytsev
  • Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Question Answering Task Leonardo Ranaldi
  • SpatialMath: Spatial Comprehension-Infused Symbolic Reasoning for Mathematical Problem-Solving Ashutosh Bajpai; Akshat Bhandari; Akshay Nambi; Tanmoy Chakraborty
  • Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs Paiheng Xu; Gang Wu; Xiang Chen; Tong Yu; Chang Xiao; Franck Dernoncourt; Tianyi Zhou; Wei Ai; Viswanathan Swaminathan
  • How Important is ‘Perfect’ English for Machine Translation Prompts? Patrícia Schmidtová; Niyati Bafna; Seth Aycock; Gianluca Vico; Wiktor Kamzela; Kathy Hämmerl; Vilém Zouhar
  • $K$ETCHUP: $K$-Step Return Estimation for Sequential Knowledge Distillation Jiabin Fan; Guoqing Luo; Michael Bowling; Lili Mou
  • Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering Haiyan Zhao; Xuansheng Wu; Fan Yang; Bo Shen; Ninghao Liu; Mengnan Du
  • Shifting Perspectives: Steering Vectors for Robust Bias Mitigation in LLMs Zara Siddique; Irtaza Khalid; Liam Turner; Luis Espinosa-Anke
  • MAPS: A Multilingual Benchmark for Agent Performance and Security Omer Hofman; Jonathan Brokman; Oren Rachmil; Shamik Bose; Vikas Pahuja; Toshiya Shimizu; Trisha Starostina; Kelly Marchisio; Seraphina Goldfarb-Tarrant; Roman Vainshtein
  • Linking Knowledge to Care: Knowledge Graph-Augmented Medical Follow-Up Question Generation Liwen Sun; Xiang Yu; Ming Tan; Zhuohao Chen; Anqi Cheng; Ashutosh Joshi; Chenyan Xiong
  • DebateQA: Evaluating Question Answering on Debatable Knowledge Rongwu Xu; Xuan Qi; Zehan Qi; Wei Xu; Zhijiang Guo
  • Personal Information Parroting in Language Models Nishant Subramani; Kshitish Ghate; Mona T. Diab
  • Harmful Factuality: LLMs Correcting What They Shouldn't Mingchen Li; Hanzhi Zhang; Heng Fan; Junhua Ding; Yunhe Feng
  • Toward Beginner-Friendly LLMs for Language Learning: Controlling Difficulty in Conversation Meiqing Jin; Liam Dugan; Chris Callison-Burch
  • CodeGuard: Improving LLM Guardrails in CS Education Nishat Raihan; Noah Erdachew; FNU Jayoti Devi; Joanna C. S. Santos; Marcos Zampieri
  • ATOM: AdapTive and OptiMized dynamic temporal knowledge graph construction using LLMs Yassir Lairgi; Ludovic Moncla; Khalid Benabdeslem; Rémy Cazabet; Pierre Cléau
  • On the Interplay between Human Label Variation and Model Fairness Kemal Kurniawan; Meladel Mistica; Timothy Baldwin; Jey Han Lau
  • Where do LLMs currently stand on biomedical NER in both clean and noisy settings ? Christophe Ye; Cassie S. Mitchell
  • Scaling Data-Constrained Language Models with Synthetic Data Hirokazu Kiyomaru; Yusuke Oda; Takashi Kodama; Chaoran Liu; Daisuke Kawahara
  • The Unintended Trade-off of AI Alignment: Balancing Hallucination Mitigation and Safety in LLMs Omar Mahmoud; Ali Khalil; Thommen George Karimpanal; Buddhika Laknath Semage; Santu Rana
  • The Model’s Language Matters: A Comparative Privacy Analysis of LLMs Abhishek Kumar Mishra; Antoine Boutet; Lucas Magnana
  • Towards the First NLP Benchmark for Ladin - an Extremely Low-Resource Language Ulin Nuha; Adam Jatowt
  • DRAGON: Domain-specific Robust Automatic Data Generation for RAG Optimization Haiyang Shen; Hang Yan; zhongshi Xing; Mugeng Liu; Yue Li; Zhiyang Chen; Yuxiang Wang; Jiuzheng Wang; Yun Ma
  • Causal Activation Steering via Sparse Mediation Toan Doan; Uyen Le; Thin Nguyen
  • Causal Direct Preference Optimization for Language Model Alignment Uyen Le; Thin Nguyen; Toan Nguyen; Toan Doan; Trung Le; Bac Le
  • LLMs Faithfully and Iteratively Compute Answers During CoT: A Systematic Analysis With Multi-hop Arithmetics Keito Kudo; Yoichi Aoki; Tatsuki Kuribayashi; Shusaku Sone; Masaya Taniguchi; Ana Brassard; Keisuke Sakaguchi; Kentaro Inui
  • VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery Jinchao Ge; Tengfei Cheng; Biao Wu; Zeyu Zhang; SHIYA HUANG; Judith Bishop; Gillian Shepherd; Meng Fang; Ling Chen; Yang Zhao
  • PromptPrism: A Linguistically-Inspired Taxonomy for Prompts Sullam Jeoung; Yueyan Chen; Yi Zhang; Shuai Wang; Haibo Ding; Lin Lee Cheong
  • HiGraAgent: Dual-Agent Adaptive Reasoning over Hierarchical Knowledge Graph for Open Domain Multi-hop Question Answering Hung Luu; Long Nguyen; Trung Pham; Hieu Pham; Tho Quan
  • Hire Your Anthropologist! Rethinking Culture Benchmarks Through an Anthropological Lens Mai Alkhamissi; Yunze Xiao; Badr AlKhamissi; Mona T. Diab
  • Suppressing Final Layer Hidden State Jumps in Transformer Pretraining Keigo Shibata; Kazuki Yano; Ryosuke Takahashi; Jaesung Lee; Wataru Ikeda; Jun Suzuki
  • Intention-Adaptive LLM Fine-Tuning for Text Revision Generation Zhexiong Liu; Diane Litman
  • ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting Yuxing Tian; Fengran Mo; Weixu Zhang; Yiyan Qi; Jian-Yun Nie
  • MapAgent: A Hierarchical Agent for Geospatial Reasoning with Dynamic Map Tool Integration Md Hasebul Hasan; Mahir Labib Dihan; Tanzima Hashem; Mohammed Eunus Ali; Md Rizwan Parvez
  • Comprehensive Study of Bilingual and Multi-category Instruction Pre-training Takashi Kodama; Yusuke Oda
  • Reflect, Rewrite, Repeat: How Simple Arithmetic Enables Advanced Reasoning in Small Language Models Mengdie Flora Wang; Haochen Xie; Mun Young Kim; Baishali Chaudhury; Meghana Ashok; Suren Gunturu; Sungmin Hong; Jae Oh Woo
  • Don’t Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation Jiwon Moon; Yerin Hwang; Dongryeol Lee; taegwan kang; Yongil Kim; Kyomin Jung
  • COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations Rui Xing; Preslav Nakov; Timothy Baldwin; Jey Han Lau
  • Deterministic Personality Editing of Large Language Models Using Adversarial Conversational History Jivnesh Sandhan; Fei Cheng; Tushar Sandhan; Yugo Murawaki
  • ParsTranslit: Truly Versatile Tajik-Farsi Transliteration Rayyan Merchant; Kevin Tang
  • One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic Representations Kohei Oda; Po-Min Chuang; Kiyoaki Shirai; Natthawut Kertkeidkachorn
  • MSC-Bench: A Rigorous Benchmark for Multi-Server Tool Orchestration Jia-Kai Dong; I-Wei Huang; Chun-Tin Wu; YI-TIEN TSAI
  • SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation Sina Bagheri Nezhad; Yao Li; Ameeta Agrawal
  • Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues Heejeong Jeon; MinSu Park; YunSeok Choi; Eunil Park
  • NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey Dhiman Goswami; Jai Kruthunz Naveen Kumar; Sanchari Das
  • CROWDSELECT: SyntheticInstruction Data Selection with Multi-LLM Wisdom Yisen Li; Lingfeng Yang; Wenxuan Shen; Pan Zhou; Yao Wan; Weiwei Lin; Dongping Chen
  • Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations Sheng-Lun Wei; Yu-Ling Liao; Yen-Hua Chang; Hen-Hsen Huang; Hsin-Hsi Chen
  • Pushing the Frontiers of Scientific Fact-Checking: The SCINLP Dataset Iffat Maab; Junichi Yamagishi
  • SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation Nobuhiro Ueda; Yuyang Dong; Krisztián Boros; Daiki Ito; Takuya Sera; Masafumi Oyamada
  • Unified Multimodal Interleaved Document Representation for Retrieval Jaewoo Lee; Joonho Ko; Jinheon Baek; Soyeong Jeong; Sung Ju Hwang
  • TELLME: Test-Enhanced Learning for Language Model Enrichment Minjun Kim; Inho Won; HyeonSeok Lim; MinKyu Kim; Junghun Yuk; Wooyoung Go; Jongyoul Park; Jungyeul Park; KyungTae Lim
  • Beyond Accuracy: Alignment and Error Detection across Languages in the Bi-GSM8K Math-Teaching Benchmark Jieun Park; KyungTae Lim; JOON-HO LIM
  • VN-MTEB: Vietnamese Massive Text Embedding Benchmark Loc Pham; Tung Luu; Thu Vo; Minh Nguyen; Viet Hoang
  • See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval Mingyu Jeon; Sungjin Han; Jinkwon Hwang; Minchol Kwon; Jonghee Kim; Junyeong Kim
  • RB-LoRA: Rank-Balanced Aggregation for Low-Rank Adaptation with Federated Fine-Tuning Sihyeon ha; Yongjeong Oh; Yo-Seb Jeon
  • Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It Seyed Mahed Mousavi; Edoardo Cecchinato; Lucia Horníková; giuseppe riccardi
  • Confidence-Driven Multi-Scale Model Selection for Cost-Effective NLU Bo-Wei Chen; Chung-Chi Chen; An-Zi Yen
  • Navigating the Impact of Structured Output Format on Large Language Models through the Compass of Causal Inference Han Yuan; Yue Zhao; Li Zhang; Wuqiong Luo; Zheng Ma
  • Breaking the Illusion of Reasoning in Polish LLMs: Quality over Quantity of Thought Dzmitry Pihulski; MikoЕ‚aj Langner; Jan Eliasz; Przemyslaw Kazienko; Jan Kocon; Teddy Ferdinan
  • RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library jiapeng wang; Jinhao Jiang; Zhiqiang Zhang; JUN ZHOU; Xin Zhao
  • WebNovelBench: Placing LLM Novelists on the Web Novel Distribution Leon Lin; Jun Zheng; Haidong Wang
  • From Semantics to Style: A Cross-Dataset Comparative Framework for Sentence Similarity Predictions Yusuke Yamauchi; Akiko Aizawa
  • Feature Drift: How Fine-Tuning Repurposes Representations in LLMs Andrey V. Galichin; Anton Korznikov; Alexey Dontsov; Oleg Rogov; Elena Tutubalina; Ivan Oseledets
  • Detecting Winning Arguments with Large Language Models and Persuasion Strategies Tiziano Labruna; Arkadiusz Modzelewski; Giorgio Satta; Giovanni Da San Martino
  • The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning Mingkai Tian; Guorong Li; Yuankai Qi; Anton van den Hengel; Qingming Huang
  • Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning Sara Rajaee; Rochelle Choenni; Ekaterina Shutova; Christof Monz
  • Nuanced Toxicity Detection in Spanish: A New Corpus and Benchmark Study Alba María Mármol-Romero, Robiert Sepúlveda-Torres, Estela Saquete, María-Teresa Martín-Valdivia, Alfonso Ureña
  • Persona Switch: Mixing Distinct Perspectives in Decoding Time Junseok Kim; Nakyeong Yang; Kyomin Jung
  • Revealing the Truth with ConLLM for Detecting Multi-Modal Deepfakes Gautam Siddharth Kashyap; Harsh Joshi; Niharika Jain; Ebad Shabbir; Jiechao Gao; Nipun Joshi; Usman Naseem
  • Detection of Adversarial Prompts with Model Predictive Entropy Franziska Rubenbauer; Sebastian Steindl; Patrick Levi; Daniel Loebenberger; Ulrich Schäfer
  • Actors, Frames and Arguments: A Multi-Decade Computational Analysis of Climate Discourse in Financial News using Large Language Models Ruiran Su; Markus Leippold; Janet B. Pierrehumbert
  • RECAP: REwriting Conversations for Intent Understanding in Agentic Planning Kushan Mitra; Dan Zhang; Hannah Kim; Estevam Hruschka
  • Modeling Turn-Taking with Semantically Informed Gestures Varsha Suresh; M. Hamza Mughal; Christian Theobalt; Vera Demberg
  • Do Large Language Models Reflect Demographic Pluralism in Safety? Usman Naseem; Gautam Siddharth Kashyap; Sushant Kumar Ray; Rafiq Ali; Ebad Shabbir; Abdullah Mohammad
  • Adversarial Decoding: Generating Readable Documents for Adversarial Objectives Collin Zhang; Tingwei Zhang; Vitaly Shmatikov
  • MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators John Mendonça; Alon Lavie; Isabel Trancoso
  • Which Works Best for Vietnamese? A Practical Study of Information Retrieval Methods across Domains Long Nguyen; Tho Quan
  • MemeWeaver: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection Paolo Italiani; David Gimeno-Gómez; Luca Ragazzi; Gianluca Moro; Paolo Rosso
  • SEAM: Bridging the Temporal-Semantic Granularity Gap for LLM-based Speech Recognition Junseok Oh; Ji-Hwan Kim
  • Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness Luca Giordano; Simon Razniewski
  • Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health Trung Hieu Ngo; Adrien Bazoge; Solen Quiniou; Pierre-Antoine GOURRAUD; Emmanuel Morin
  • FOL-Traces: Verified First-Order Logic Reasoning Traces at Scale Isabelle Lee; Sarah Liaw; Dani Yogatama
  • Uncertainty Quantification for Evaluating Gender Bias in Machine Translation Ieva Staliunaite; Julius Cheng; Andreas Vlachos
  • PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation Yongfu Xue
  • The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models Konrad Löhr; Shuzhou Yuan; Michael Färber
  • TIPA: Typologically Informed Parameter Aggregation Stef Accou; Wessel Poelman
  • Can Calibration of Positional Encodings Enhance RAG Performance? Tom Zehle; Matthias Aßenmacher
  • CrisiText: A dataset of warning messages for LLM training in emergency communication Giacomo Gonella; Gian Maria Campedelli; Stefano Menini; Marco Guerini
  • FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition Jonas Golde; Patrick Haller; Alan Akbik
  • Bias in the East, Bias in the West: A Bilingual Analysis of LLM Political Bias on U.S.- and China-Related Issues Ying Ying Lim; Paul Röttger
  • Ask Me Again Differently: GRAS for Measuring Bias in Vision Language Models on Gender, Race, Age, and Skin Tone Shaivi Malik; Hasnat Md Abdullah; Sriparna Saha; Amit Sheth
  • A Simple and Efficient Learning-Style Prompting for LLM Jailbreaking Xuan Luo; YUE WANG; Zefeng He; Geng Tu; Jing Li; Ruifeng Xu
  • Aggregating Crowd of LLMs for Cost-Effective Data Annotation Jiacheng Liu; Xiaofeng Hou
  • Representation Collapse in Machine Translation Through the Lens of Angular Dispersion Evgeniia Tokarchuk; Maya K. Nachesa; Sergey Troshin; Vlad Niculae
  • Training-Free Text Emotion Tagging via LLM-Based Best-Worst Scaling Lukas Christ; Shahin Amiriparian
  • Can LLMs Reason Like Doctors? Exploring the Limits of Large Language Models in Complex Medical Reasoning Flavio Merenda; Jose Manuel Gomez-Perez; German Rigau
  • Testing Low-Resource Language Support in LLMs Using Language Proficiency Exams: the Case of Luxembourgish Cedric Lothritz; Jordi Cabot; Laura Bernardy
  • Unveiling Decision-Making in LLMs for Text Classification : Extraction of influential and interpretable concepts with Sparse Autoencoders Mathis Le Bail; Jérémie Dentan; Davide Buscaldi; Vanier Sonia
  • TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action Chenyue Zhou; Gürkan Solmaz; Flavio Cirillo; Kiril Gashteovski; Jonathan Fürst
  • MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning Mahbub E Sobhani; Md. Faiyaz Abdullah Sayeedi; Tasnim Mohiuddin; Md Mofijul Islam; Swakkhar Shatabda
  • Enhancing Reliability in Community Question Answering with an Expert-Oriented RAG System Seyyede Zahra Aftabi; Saeed Farzi
  • Unsupervised Text Style Transfer for Controllable Intensity Shuhuan Gu; Wenbiao Tao; Xinchen Ma; Kangkang He; Ye Guo; Xiang Li; Yunshi Lan
  • SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases AmirHossein Safdarian; Milad Mohammadi; Ehsan Jahanbakhsh Bashirloo; Mona Shahamat Naderi; Heshaam Faili
  • Binary Token-Level Classification with DeBERTa for All-Type MWE Identification: A Lightweight Approach with Linguistic Enhancement Diego Rossini; Lonneke van der Plas
  • Emotion Alignment Between Text and Speech is Limited: A Cross-Modal Study David Lindevelt; Suzan Verberne; Joost Broekens
  • Seeing All Sides: Multi-Perspective In-Context Learning for Subjective NLP Benedetta Muscato; Yue Li; Gizem Gezici; Zhixue Zhao; Fosca Giannotti
  • Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models Kumiko Nakajima; Jan Zuiderveld; Sandro Pezzelle
  • Are Multimodal LLMs Movie Buffs? Carlo Bretti; Pascal Mettes; Nanne Van Noord
  • Process Evaluation for Agentic Systems Milan Gritta; Debjit Paul; Gerasimos Lampouras; Jun Wang; Xiaoguang Li; Lifeng Shang
  • MIMIC: Multi-party Dialogue Augmentation via Speaker Stylistic Transfer Gaetano Cimino; Giuseppe Carenini; Vincenzo Deufemia
  • TechING: Towards Real World Technical Image Understanding via VLMs Tafazzul Nadeem; Bhavik Shangari; Manish Rai; Gagan Raj Gupta; Ashutosh Modi
  • Multilingual-To-Multimodal (M2M): Unlocking New Languages with Monolingual Text Piyush Singh Pasi
  • Do GUI Grounders Truly Understand UI Elements? Surgan Jandial; Yinheng Li; Justin Wagle; Kazuhito Koishida
  • Scaling Cultural Resources for Improving Generative Models Hayk Stepanyan; Aishwarya Verma; Andrew Zaldivar; Rutledge Chin Feman; Erin MacMurray van Liemt; Charu Kalia; Vinodkumar Prabhakaran; Sunipa Dev
  • Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks Haowei Fu; Bo Ni; Han Xu; Kunpeng Liu; Dan Lin; Tyler Derr
  • SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents Qiusi Zhan; Angeline Budiman-Chan; Abdelrahman Zayed; Xingzhi Guo; Daniel Kang; Joo-Kyung Kim
  • SAGE : A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn Agent Evaluation Ryan Shea; Yunan Lu; Liang Qiu; Zhou Yu
  • Better as Generators Than Classifiers: Leveraging LLMs and Synthetic Data for Low-Resource Multilingual Classification Branislav Pecher; Jan Cegin; Robert Belanec; Ivan Srba; Jakub Simko; Maria Bielikova
  • Dialogue is Better Than Monologue: Instructing Meidcal LLMs via Strategic Conversations Zijie Liu; Xinyu Zhao; Jie Peng; Jinhao Duan; Zhuangdi Zhu; Qingyu Chen; Kaidi Xu; Xia Hu; Tianlong Chen
  • DF-RAG: Query-Aware Diversity for Retrieval-Augmented Generation Saadat Hasan Khan; Spencer Hong; Jingyu Wu; Kevin Lybarger; Youbing Yin; Erin Babinsky; Daben Liu
  • Dimension-First Evaluation of Voice Assistants: Human Chain-of-Thought and Structured Judges Arjun Chandra; Kevin Miller; Venkatesh Ravichandran; Constantinos Papayiannis; Venkatesh Saligrama
  • Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention Phuong Minh Nguyen; Dang Huu-Tien; Naoya Inoue
  • Thinking Long, but Short: Stable Sequential Test-Time Scaling for Large Reasoning Models Michael R. Metel; Yufei Cui; Boxing Chen; Prasanna Parthasarathi
  • Defeating Cerberus: Privacy-Leakage Mitigation in Vision Language Models Boyang Zhang; Istemi Ekin Akkus; Ruichuan Chen; Alice Dethise; Klaus Satzke; Ivica Rimac; Yang Zhang
  • TruthTrap: A Bilingual Benchmark for Evaluating Factually Correct Yet Misleading Information in Question Answering Mohammadamin Shafiei; Hamidreza Saffari; Mohammad Taher Pilehvar; Alessandro Raganato
  • FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression Jiayi Tian; Ryan Solgi; Jinming Lu; Yifan Yang; Hai Li; Zheng Zhang
  • Negative Sampling Techniques in Dense Retrieval: A Survey Laurin Wischounig; Abdelrahman Abdallah; Adam Jatowt
  • Multi-Agent Procedural Graph Extraction with Structural and Logical Refinement Wangyang Ying; Yanchi Liu; Xujiang Zhao; Wei Cheng; Zhengzhang Chen; Wenchao Yu; Yanjie Fu; Haifeng Chen
  • MADIAVE: Multi-Agent Debate for Implicit Attribute Value Extraction Wei-Chieh Huang; Cornelia Caragea
  • ElectoralCheck: Benchmarking LLM Political Stances on Election Topics Prince Jha; Konika Mandal; Arkadeep Acharya; Sriparna Saha; Sandipan Dandapat
  • DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router Minghao Guo; Qingcheng Zeng; Xujiang Zhao; Yanchi Liu; Wenchao Yu; Mengnan Du; Haifeng Chen; Wei Cheng
  • Analyzing Instruction Optimization in LLM-based Pipelines for Tabular Fact Verification Xiaotang Du; Giwon Hong; Wai-Chung Kwan; Rohit Saxena; Ivan Titov; Pasquale Minervini; Emily Allaway
  • XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark Ioan-Paul Ciobanu; Andrei-Iulian Hîji; Nicolae Catalin Ristea; Paul Irofti; Cristian Rusu; Radu Tudor Ionescu
  • CLEAR-3K: Assessing Causal Explanatory Capabilities in Language Models Naiming Liu; Richard Baraniuk; Shashank Sonkar
  • Imbalanced Gradients in RL Post-Training of Multi-Task LLMs Runzhe Wu; Ankur Samanta; Ayush Jain; Scott Fujimoto; Jeongyeol Kwon; Ben Kretzu; Youliang Yu; Kaveh Hassani; Boris Vidolov; Yonathan Efroni
  • BayesFlow: A Probability Inference Framework for Meta-Agent Assisted Workflow Generation Bo Yuan; Yun Zhou; Zhichao Xu; Kiran Ramnath; Aosong Feng; Balasubramaniam Srinivasan
  • HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning Guimin Hu; Daniel Hershcovich; Hasti Seifi
  • Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation Zizhong Li; Haopeng Zhang; Jiawei Zhang
  • PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference Krishna Teja Chitty-Venkata; Jie Ye; Siddhisanket Raskar; Anthony Kougkas; Xian Sun; Murali Emani; Venkatram Vishwanath; Bogdan Nicolae
  • SMART-Editor: A Multi-Agent Framework for Human-Like Design Editing with Structural Integrity Ishani Mondal; Meera Bharadwaj; Ayush Roy; Aparna Garimella; Jordan Lee Boyd-Graber
  • ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations Amr Gomaa; Ahmed Salem; Sahar Abdelnabi
  • SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning Kaiwen Zhou; Ahmed Elgohary; A S M Iftekhar; Amin Saied
  • Who You Are, What You Say: Intra- and Inter- Context Personality for Emotion Recognition in Conversation Tazeek Bin Abdur Rakib; Lay-Ki Soon; Wern Han Lim
  • DRIVINGVQA: A Dataset for Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios Charles CorbiЏre; Simon Roburin; Syrielle Montariol; Antoine Bosselut; Alexandre Alahi
  • Steerable Agentic Data Generation for Deep Search with Execution Feedback Fangyuan Xu; Rujun Han; Yanfei Chen; Zifeng Wang; I-Hung Hsu; Jun Yan; Vishy Tirumalashetty; Eunsol Choi; Tomas Pfister; Chen-Yu Lee
  • Negative-Aware Diffusion Process for Temporal Knowledge Graph Extrapolation Yanglei Gan; Peng He; Yuxiang Cai; Run Lin; Guanyu Zhou; Qiao Liu
  • DS$^2$-Instruct: Domain-Specific Data Synthesis for Large Language Models Instruction Tuning Ruiyao Xu; Noelle I. Samia; Han Liu
  • DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards Aaryaman Kartha; Ahmed Masry; Mohammed Saidul Islam; Thinh Lang; Shadikur Rahman; Ridwan Mahbub; Mizanur Rahman; Mahir Ahmed; Md Rizwan Parvez; Enamul Hoque; Shafiq Joty
  • Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know? Zhiting Mei; Christina Zhang; Tenny Yin; Justin Lidard; Ola Sho; Anirudha Majumdar
  • AfriMMT-EA: Multi-domain Machine Translation for Low-Resource East African Languages Naome A Etori; Kelechi Ezema; Nathaniel Romney Robinson; Davis David; Alfred Malengo Kondoro; Elisha Ondieki Makori; Michael Samwel Mollel; Maria Gini
  • Diffusion Language Model Inference with Monte Carlo Tree Search Zheng Huang; Kiran Ramnath; Yueyan Chen; Aosong Feng; Sangmin Woo; Balasubramaniam Srinivasan; Zhichao Xu; Kang Zhou; Shuai Wang; Haibo Ding; Lin Lee Cheong
  • DWA-KD: Dual-Space Weighting and Time-Warped Alignment for Cross-Tokenizer Knowledge Distillation Duc Trung Vu; Pham Khanh Chi; Dat Phi Van; Linh Ngo Van; Dinh Viet Sang; Trung Le
  • Harnessing Consistency for Robust Test-Time LLM Ensemble Zhichen Zeng; Qi Yu; Xiao Lin; Ruizhong Qiu; Xuying Ning; Tianxin Wei; Yuchen Yan; Jingrui He; Hanghang Tong
  • AutoAnoEval: Semantic-Aware Model Selection via Tree-Guided LLM Reasoning for Tabular Anomaly Detection Suhee Yoon; Sanghyu Yoon; Ye Seul Sim; Seungdong Yoa; Dongmin Kim; Soonyoung Lee; Hankook Lee; Woohyung Lim
  • Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment Tiejin Chen; Xiaoou Liu; Vishnu Nandam; Kuan-Ru Liou; Hua Wei
  • ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization Sunzhu Li; Zhiyu Lin; Jiale Zhao; Shuling Yang; Chen Wei
  • LitE-SQL: A Lightweight and Efficient Text-to-SQL Framework with Vector-based Schema Linking and Execution-Guided Self-Correction Shengmin Piao; Jieun Lee; Sanghyun Park
  • Beyond Coherence: Improving Temporal Consistency and Interpretability in Dynamic Topic Models Thanh Vinh Nguyen; Ngo Van Dong; Chu Xuan Minh; Tung Nguyen; Linh Ngo Van; Dinh Viet Sang; Trung Le
  • Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use Yiyang Li; Zehong Wang; Zhengqing Yuan; Zheyuan Zhang; Keerthiram Murugesan; Chuxu Zhang; Yanfang Ye
  • Tailoring Memory Granularity for Multi-Hop Reasoning over Long Contexts Peijun Qing; Xingjian Diao; Chiyu Ma; Saeed Hassanpour; Soroush Vosoughi
  • Unlocking Large Audio-Language Models for Interactive Language Learning Hongfu Liu; Zhouying Cui; Xiangming Gu; Ye Wang
  • Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer Jinghan Zhang; Fengran Mo; Tharindu Cyril Weerasooriya; Xinyue Ye; Dongjie Wang; Yanjie Fu; Kunpeng Liu
  • StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs Haohan Yuan; Sukhwa Hong; Haopeng Zhang
  • Logits-Based Block Pruning with Affine Transformations for Large Language Models Zekun Hu; Yichu Xu; De-Chuan Zhan
  • MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models Martin Hyben; Sebastian Kula; Jan Cegin; Jakub Simko; Ivan Srba; Robert Moro
  • What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects Naihao Deng; Sheng Zhang; Henghui Zhu; Shuaichen Chang; Jiani Zhang; Alexander Hanbo Li; Chung-Wei Hang; Hideo Kobayashi; Yiqun Hu; Patrick Ng
  • Evaluating Morphological Plausibility of Subword Tokenization via Statistical Alignment with Morpho-Syntactic Features Abishek Stephen; Jindřich Libovický
  • MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning Vera Pavlova; Mohammed Makhlouf
  • BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage Kalyan Nakka; Nitesh Saxena
  • The Problem of Ambiguity in Table Question Answering Jorge Osés Grijalba; L. Alfonso Ureña López; Eugenio Martínez Cámara; Jose Camacho-Collados
  • Beyond Multiple Choice: Evaluating Steering Vectors for Summarization Joschka Braun; Carsten Eickhoff; Seyed Ali Bahrainian
  • Similar Region Search using LLMs on Spatial Feature Space Al-Amin Sany; Mohaiminul Islam; Tanzima Hashem; Md. Ashraful Islam; Mohammed Eunus Ali
  • Learning to Ask: Multi-Decoder Fine-Tuning for Multi-Hop Visual Question Generation with External Knowledge Arpan Phukan; Manish Gupta; Asif Ekbal
  • SLANG-GraphRAG: Multi-Layered Retrieval with Domain-Specific Knowledge for Low Resource Social Media Conversations Ifeoluwa Wuraola; Daniel Marciniak; Nina Dethlefs
  • Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers Hannah Calzi Kleidermacher; James Zou
  • TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs Minjae Lee; Wonjun Kang; Byeongkeun Ahn; Christian Classen; Kevin Galim; Seunghyuk Oh; Minghao Yan; Hyung Il Koo; Kangwook Lee
  • KGHaluBench: A Knowledge Graph-Based Hallucination Benchmark for Evaluating the Breadth and Depth of LLM Knowledge Alex Robertson; Huizhi Liang; Mahbub Gani; Rohit Kumar; Srijith Rajamohan
  • Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code Jungin Kim; Shinwoo Park; Yo-Sub Han
  • VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval Diogo Glória-Silva; David Semedo; Joao Magalhaes
  • Attribute-Controlled Translation with Preference Optimization Inigo Jauregi Unanue; Najmeh Sadoughi; Vimal Bhat; Zhu Liu; Massimo Piccardi
  • ReciFine: Finely Annotated Recipe Dataset for Controllable Recipe Generation Nuhu Ibrahim; Rishi Ravikumar; Robert Stevens; Riza Batista-Navarro
  • ReBPE: Iteratively Improving the Internal Structure of a Structured Tokeniser by Mining its Internal Structure Thomas Bauwens; Miryam de Lhoneux
  • Emotion Recognition in Multi-Speaker Conversations through Speaker Identification, Knowledge Distillation, and Hierarchical Fusion LI XIAO; Kotaro Funakoshi; Manabu Okumura
  • Demystifying Mixed Outcomes of Self-Training: Pre-training Analyses on Non-Toy LLMs Yusuke Nakamura; Hirokazu Kiyomaru; Chaoran Liu; Shuhei Kurita; Daisuke Kawahara
  • Revealing Redundant Syntax in Large Language Models through Multi-Hop Dependency Paths Masaki Sashida; Takeshi Kojima; Yusuke Iwasawa; Yutaka Matsuo
  • A Scalable Framework for Automated NER Annotation Correction in Low-Resource Languages Toqeer Ehsan; Thamar Solorio
  • Can ChatGPT Really Understand Modern Chinese Poetry? Shanshan Wang; Derek F. Wong; Jingming Yao; Lidia S. Chao
  • Knowing What's Missing: Assessing Information Sufficiency in Question Answering Akriti Jain; Aparna Garimella
  • The Curse of Verbalization: How Presentation Order Constrains LLM Reasoning Yue Zhou; Henry Peng Zou; Barbara Di Eugenio; Yang Zhang
  • PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors Donya Rooein; Sankalan Pal Chowdhury; Mariia Eremeeva; Yuan Qin; Debora Nozza; Mrinmaya Sachan; Dirk Hovy
  • Mitigating Causal Bias in LLMs via Potential Outcomes Framework and Actual Causality Theory Yiheng Zhao; Yuanliang Li; Shreya Savant; Jun Yan
  • JuriFindIT: an Italian legal retrieval dataset Niko Dalla Noce; Davide Colla; Sina Farhang Doust; Lorenzo De Mattei; Davide Bacciu
  • Towards Fair and Efficient De-identification: Quantifying the Efficiency and Generalizability of De-identification Approaches Noopur Zambare; Kiana Aghakasiri; Carissa Lin; Carrie Ye; J Ross Mitchell; Mohamed Abdalla
  • How Many Ratings per Item are Necessary for Reliable Significance Testing? Christopher M Homan; Flip Korn; Deepak Pandita; Chris Welty
  • QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal Pairs David Beauchemin; Pier-Luc Veilleux; Johanna-Pascale Roy; Richard Khoury
  • QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Task Mae Sosto; Delfina S. Martinez Pandiani; Laura Hollink
  • Efficient Table Retrieval and Understanding with Multimodal Large Language Models Zhuoyan Xu; Haoyang Fang; Boran Han; Bonan Min; Bernie Wang; Cuixiong Hu; Shuai Zhang
  • FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation Fatema Siddika; Md Anwar Hossen; Juan Pablo Munoz; Tanya G. Roosta; Anuj Sharma; Ali Jannesari
  • RiddleBench: A New Generative Reasoning Benchmark for LLMs Deepon Halder; Alan Saji; Thanmay Jayakumar; Anoop Kunchukuttan; Ratish Puduppully; Raj Dabre
  • Language Model-Driven Data Pruning Enables Efficient Active Learning Abdul Hameed Azeemi; Ihsan Ayyub Qazi; Agha Ali Raza
  • HARM: Learning Hate-Aware Reward Model for Evaluating Natural Language Explanations of Offensive Content Lorenzo Puppi Vecchi; Alceu de Souza Britto Jr.; Emerson Cabrera Paraiso; Rafael M. O. Cruz
  • MATH-IDN: A Multilingual Mathematical Problem Solving Dataset Featuring Local Languages in Indonesia Xiao Xiao; Iftitahu Ni'mah; Yuyun Wabula; Mykola Pechenizkiy; Meng Fang
  • Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules Yilun Liu; Yunpu Ma; Yuetian Lu; Shuo Chen; Zifeng Ding; Volker Tresp
  • MAPRO: Recasting Multi-Agent Prompt Optimization as Maximum a Posteriori Inference Zheyuan Zhang; Lin Ge; Hongjiang Li; Weicheng Zhu; Chuxu Zhang; Yanfang Ye
  • Debiasing Large Language Models via Adaptive Causal Prompting with Sketch-of-Thought Bowen Li; Ziqi Xu; Jing Ren; Renqiang Luo; Xikun Zhang; Xiuzhen Zhang; Yongli Ren; Feng Xia
  • ExpressivityBench: Can LLMs Communicate Implicitly? Joshua Tint; Som Sagar; Aditya Taparia; Kelly Raines; Bimsara Pathiraja; Caleb Liu; Ransalu Senanayake
  • Rethinking Schema Linking: A Context-Aware Bidirectional Retrieval Approach for Text-to-SQL Md Mahadi Hasan Nahid; Davood Rafiei; Weiwei Zhang; Yong Zhang
  • PEAR: Planner-Executor Agent Robustness Benchmark Shen Dong; Mingxuan Zhang; Pengfei He; Li Ma; Bhavani Thuraisingham; Hui Liu; Yue Xing
  • Toward Safe and Human-Aligned Game Conversational Recommendation via Multi-Agent Decomposition Zheng Hui; Xiaokai Wei; Yexi Jiang; Kevin Gao; Chen Wang; Se-eun Yoon; Rachit Pareek; Michelle Gong
  • Linguistic Cues for LLM-based Implicit Discourse Relation Classification Yi Fan; Michael Strube; Wei Liu
  • SpARK: An Embarrassingly Simple Sparse Watermarking in LLMs with Enhanced Text Quality Duy Cao Hoang; Thanh Quoc Hung Le; Rui Chu; Ping Li; Weijie Zhao; Yingjie Lao; Khoa D Doan
  • Pretraining Language Models for Diachronic Linguistic Change Discovery Elisabeth Fittschen; Sabrina Xin Li; Tom Lippincott; Leshem Choshen; Craig Messner
  • Improving Language Identification for Code-Switched Speech: The Pivotal Role of Accented English Adyasha Patra; Dhiraj Kumar Sah; Preethi Jyothi
  • Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval Aditya Sharma; Christopher Pal; Amal Zouaq
  • Jailbreaking Safeguarded Text-to-Image Models via Large Language Models Zhengyuan Jiang; Yuepeng Hu; Yuchen Yang; Yinzhi Cao; Neil Zhenqiang Gong
  • BSCodec: A Band-Split Neural Codec for High-Quality Universal Audio Reconstruction Haoran Wang; Jiatong Shi; Jinchuan Tian; Bohan Li; Kai Yu; Shinji Watanabe
  • Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization Jiwei Guan; Haibo Jin; Haohan Wang
  • SALT: Step-level Advantage Assignment for Long-horizon Agents via Trajectory Graph Jiazheng Li; Yawei Wang; Qiaojing Yan; Yijun Tian; Zhichao Xu; Huan Song; Panpan Xu; Lin Lee Cheong
  • UniToolBench: A Benchmark for Tool-Augmented LLMs in Cross-Domain, Universal Task Automation Xiaojie Guo; Yang Zhang; Bing Zhang; Ryo Kawahara; Mikio Takeuchi; Yada Zhu
  • Benchmarking the Energy Savings with Speculative Decoding Strategies Rohit Dutta; Paramita Koley; Soham Poddar; Janardan Misra; Sanjay Podder; Naveen Balani; Saptarshi Ghosh; Niloy Ganguly
  • NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding Yeonkyoung So; Gyuseong Lee; Sungmok Jung; Joonhak Lee; JiA Kang; Sangho Kim; Jaejin Lee
  • What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance William Watson; Nicole Cho; Sumitra Ganesh; Manuela Veloso
  • Completely Modular Fine-tuning for Dynamic Language Adaptation Zhe Cao; Yusuke Oda; Qianying Liu; Akiko Aizawa; Taro Watanabe
  • A Multi-Task Learning Framework for Modeling Engagement and Topic-Sensitive Responses in Arabic Women’s Discourse Mabrouka bessghaier; Md. Rafiul Biswas; Shimaa Ibrahim; Wajdi Zaghouani
  • We Are What We Repeatedly Do: Improving Long Context Instruction Following Preston K Robinette; Andrew Hard; Swaroop Ramaswamy; Ehsan Amid; Rajiv Mathews; Taylor T Johnson
  • ConRAS: Contrastive In-context Learning Framework for Retrieval-Augmented Summarization Juseon Do; Sungwoo Han; Jingun Kwon; Hidetaka Kamigaito; Manabu Okumura
  • Beyond Sampling: Self-Sorting for Long-Context Ranking Juseon Do; Sungwoo Han; Jingun Kwon; Hidetaka Kamigaito; Katsuhiko Hayashi; Taro Watanabe
  • Program-of-Thought Reveals LLM Abstraction Ceilings Mike Zhou; Fenil Bardoliya; Vivek Gupta; Dan Roth
  • From Numbers to Narratives: Efficient Language Model-Based Detection for Safety-Critical Minority Classes Ahatsham Hayat; Hunter Tridle; Mohammad Rashedul Hasan
  • R-GDA: Reflective Guidance Data Augmentation with Multi-Agent Feedback for Domain-Specific Named Entity Recognition Hyeonseok Kang; Hyuk Namgoong; Goun pyeon; Sangkeun Jung
  • Enabling Autoregressive Models to Fill In Masked Tokens Daniel Mingyi Israel; Aditya Grover; Guy Van den Broeck
  • Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers Atsushi Shimizu; Shohei Taniguchi; Yutaka Matsuo
  • Open-Domain Safety Policy Construction Di Wu; Siyue Liu; Zixiang Ji; Ya-Liang Chang; Zhe-Yu Liu; Andrew Pleffer; Kai-Wei Chang
  • Think Just Enough: Leveraging Self-Assessed Confidence for Adaptive Reasoning in Language Models Junyeob Kim; Sang-goo Lee; Taeuk Kim
  • CLICKER: Cross-Lingual Knowledge Editing via In-Context Learning with Adaptive Stepwise Reasoning Zehui Jiang; Xin Zhao; Yuta Kumadaki; Naoki Yoshinaga
  • Show or Tell? Modeling the evolution of request-making in Human-LLM conversations Shengqi Zhu; Jeffrey Rzeszotarski; David Mimno
  • Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning Sultan AlRashed; Jianghui Wang; Francesco Orabona
  • Multilingual Self-Taught Faithfulness Evaluators Carlo Alfano; Aymen Al Marjani; Zeno Jonke; Amin Mantrach; Saab Mansour; Marcello Federico
  • Benchmarking Direct Preference Optimization for Medical Large Vision–Language Models Dain Kim; Jiwoo Lee; Jaehoon Yun; Yong Hoe Koo; Qingyu Chen; Hyunjae Kim; Jaewoo Kang
  • Stay Focused: Problem Drift in Multi-Agent Debate Jonas Becker; Lars Benedikt Kaesberg; Andreas Stephan; Jan Philip Wahle; Terry Ruas; Bela Gipp
  • FLUKE: A Linguistically-Driven and Task-Agnostic Framework for Robustness Evaluation Yulia Otmakhova; Thinh Hung Truong; Rahmad Mahendra; Zenan Zhai; Rongxin Zhu; Daniel Beck; Jey Han Lau
  • Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs Yiheng Yang; Yujie Wang; Chi Ma; Lei Yu; Emmanuele Chersoni; Chu-Ren Huang
  • PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing Anthony Hughes; Vasisht Duddu; N. Asokan; Nikolaos Aletras; Ning Ma
  • Argument Component Segmentation with Fine-Tuned Large Language Models Ettore Caputo; Sergio Greco; Lucio La Cava
  • DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection Yuliang Yan; Haochun Tang; Shuo Yan; Enyan Dai
  • The Art of Saying "Maybe": A Conformal Lens for Uncertainty Benchmarking in VLMs Asif Azad; Mohammad Sadat Hossain; MD Sadik Hossain Shanto; M Saifur Rahman; Md Rizwan Parvez
  • Diagnosis of Dysarthria Severity and Explanation Generation Using XAI-Enhanced CLINIC-GENIE on Diadochokinetic Tasks Jihyeon Kim; Insung Lee; Myoung-Wan Koo
  • A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages Raoyuan Zhao; Yihong Liu; Hinrich Schuetze; Michael A. Hedderich
  • ORSO QGen: Odds-Ratio Steerable Optimization for Controlling Question Generation Andreea Dutulescu; Stefan Ruseti; Mihai Dascalu; Danielle S McNamara
  • Leveraging Digitized Newspapers to Collect Summarization Data in Low-Resource Languages Noam Dahan; Omer Kidron; Gabriel Stanovsky
  • Let’s Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification Jingshen Zhang; Xin Ying Qiu; Lifang Lu; Zhuhua Huang; Yutao Hu; Yuechang Wu; JunYu Lu
  • LogToP: Logic Tree-of-Program with Table Instruction-tuned LLMs for Controlled Logical Table-to-Text Generation Yupian Lin; Guangya Yu; Cheng Yuan; Huan Du; Hui Luo; Yuang Bian; Jingping Liu; Zhidong He; Wen Du; Tong Ruan
  • IRPO: Implicit Policy Regularized Preference Optimization Youngsoo Jang; Yu Jin Kim; Geon-Hyeong Kim; Honglak Lee; Moontae Lee
  • DFPE: A Diverse Fingerprint Ensemble for Enhancing LLM Performance Seffi Cohen; Nurit Cohen Inger; Niv Goldshlager; Bracha Shapira; Lior Rokach
  • Ranking Human and LLM Texts Using Locality Statistics Yiyang Wang; Chen Ding; Hangfeng He
  • MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Jeonghun Baek; Kazuki Egashira; Shota Onohara; Atsuyuki Miyai; Yuki Imajuku; Hikaru Ikuta; Kiyoharu Aizawa
  • Hierarchical User Intent Inference with Knowledge Graph Grounding Tzu-Cheng Peng; Chien Chin Chen; Yung-Chun Chang
  • Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection Joe Stacey; Lisa Alazraki; Aran Ubhi; Beyza Ermis; Aaron Mueller; Marek Rei
  • MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models Siwei Wu; King Zhu; Yiming Liang; Yu Bai; Yizhi LI; Haoning Wu; Jiaheng Liu; Ruibo Liu; Xingwei Qu; Xuxin Cheng; Ge Zhang; Wenhao Huang; Chenghua Lin
  • COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Siwei Wu; JinCheng Ren; Xeron Du; Shuyue Guo; Xingwei Qu; Yiming Liang; Jie Liu; Yunwen Li; Tyler Loakman; Tianyu Zheng; Boyu Feng; Huaqing Yuan; Zili Wang; Jiaheng Liu; Wenhao Huang; chenglin cai; Haoran Que; Jian Yang; Yuelin Bai; Zekun Moore Wang; Zhouliang Yu; Qunshu Lin; Ding Pan; Yuchen Eleanor Jiang; Tiannan Wang; Wangchunshu Zhou; Shenzhi Wang; Xingyuan Bu; Minghao Liu; Guoyin Wang; Ge Zhang; Chenghua Lin
  • Revealing the Numeracy Gap: An Empirical Investigation of Text Embedding Models Ningyuan Deng; Hanyu Duan; Yixuan Tang; Yi Yang
  • code_transformed: The Influence of Large Language Models on Code Yuliang Xu; Siming Huang; Mingmeng Geng; Yao Wan; Xuanhua Shi; Dongping Chen
  • Do LLMs model human linguistic variation? A case study in Hindi-English do-verb code-mixing Mukund Choudhary; Madhur Jindal; Gaurja Aeron; Monojit Choudhury
  • ART: Attention-Regularized Transformers for Multi-Modal Robustness Mohammed Bouri; Mohammed Erradi; Adnane Saoud
  • GRAFF: GRaph-Augmented Fine-grained Fusion for Large Language Models Himanshu Chaudhary; Ruida WANG; Gowtham Ramesh; Junjie Hu
  • Tackling Distractor Documents in Multi-Hop QA with Reinforcement and Curriculum Learning Jerry Huang; Siddarth Madala; Risham Sidhu; Cheng Niu; Hao Peng; Julia Hockenmaier; Tong Zhang
  • RoD-TAL: A Benchmark for Answering Questions in Romanian Driving License Exams Andrei Vlad Man; RДѓzvan-Alexandru SmДѓdu; Cristian-George Craciun; Dumitru-Clementin Cercel; Florin Pop; Mihaela-Claudia Cercel
  • FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs Albert Sawczyn; Jakub Binkowski; Denis Janiak; Bogdan Gabrys; Tomasz Jan Kajdanowicz
  • Punctuations and Predicates in Language Models Sonakshi Chauhan; Maheep Chaudhary; Choy Kwan Kiu; Samuel Nellessen; Nandi Schoots
  • Test-time Corpus Feedback: From Retrieval to RAG Mandeep Rathee; Venktesh V; Sean MacAvaney; Avishek Anand
  • RADAR: A Reasoning-Guided Attribution Framework for Explainable Visual Data Analysis Anku Rani; Aparna Garimella; Apoorv Saxena; Balaji Vasan Srinivasan; Paul Pu Liang
  • MaskLoRA: Low‑Rank Subspace–Induced Token Masking for Efficient and Faithful Language Models S M Rafiuddin
  • A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction Marco Martinelli; Stefano Marchesin; Vanessa Bonato; Giorgio Di Nunzio; Nicola Ferro; Ornella Irrera; Laura Menotti; Federica Vezzani; Gianmaria Silvello
  • What Matters to an LLM? Behavioral and Computational Evidences from Summarization Yongxin Zhou; Changshun Wu; Philippe Mulhem; Didier Schwab; Maxime Peyrard
  • Neural network embeddings recover value dimensions from psychometric survey items on par with human data Max Pellert; Clemens M Lechner; Indira Sen; Markus Strohmaier
  • Compositional Reasoning via Joint Image and Language Decomposition Dwip Dalal; Madhav Kanda; Zhenhailong Wang; Heng Ji; Unnat Jain
  • Better Call CLAUSE: A Discrepancy Benchmark for Auditing LLMs Legal Reasoning Capabilities Manan Roy Choudhury; Adithya Chandramouli; Mannan Anand; Vivek Gupta
  • Token-Wise Kernels (TWiKers) for Vicinity-Aware Attention in Transformers Kuangdai Leng; Jia Bi; Samuel Pinilla; Jaehoon Cha
  • Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pre-training Jeffrey Li; Joshua P Gardner; Doug Kang; Fangping Shi; Karanjeet Singh; Chun-Liang Li; Herumb Shandilya; David Leo Wright Hall; Oncel Tuzel; Percy Liang; Ludwig Schmidt; Hadi Pouransari; Fartash Faghri
  • Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists MichaЕ‚ Pietruszka; ЕЃukasz Borchmann; Aleksander JД™drosz; PaweЕ‚ Morawiecki
  • Distill and Align Decomposition for Enhanced Claim Verification Jabez Magomere; Elena Kochkina; Samuel Mensah; Simerjot Kaur; Fernando Acero; Arturo Oncevay; Charese Smiley; Xiaomo Liu; Manuela Veloso
  • Human-Aligned Faithfulness in Toxicity Explanations of LLMs Ramaravind Kommiya Mothilal; Joanna Roy; Syed Ishtiaque Ahmed; Shion Guha
  • Reasoning Beyond Literal: Cross-style Multimodal Reasoning for Figurative Language Understanding Seyyed Saeid Cheshmi; Hahnemann Ortiz; James Mooney; Dongyeop Kang
  • QueStER: Query Specification for Generative Keyword-Based Retrieval Arthur SATOUF; Yuxuan ZONG; Habiboulaye Amadou Boubacar; Pablo Piantanida; Benjamin Piwowarski
  • Evaluating Sparse Autoencoders for Monosemantic Representation Moghis Fereidouni; Muhammad Umair Haider; Peizhong Ju; A.B. Siddique
  • Event Detection with a Context-Aware Encoder and LoRA for Improved Performance on Long-Tailed Classes Abdullah Al Monsur; Nitesh Vamshi Bommisetty; Gene Louis Kim
  • Think Hard Only When Needed: A Hybrid Best-of-N and Beam Search for Efficient Test-Time Compute Hyewon Suh; Chaojian Li; Cheng-Jhih Shih; Zheng Wang; Kejing Xia; Yonggan Fu; Yingyan Celine Lin
  • Role-Conditioned Refusals: Evaluating Access Control Reasoning in Large Language Models ДђorД‘e Klisura; Joseph Khoury; Ashish Kundu; RAM KRISHNAN; Anthony Rios
  • NL2Logic: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models Rizky Ramadhana Putra; Raihan Sultan Pasha Basuki; Yutong Cheng; Peng Gao
  • Coding Agents with Multimodal Browsing are General-Purpose Problem Solvers Aditya Bharat Soni; Boxuan Li; Xingyao Wang; Valerie Chen; Graham Neubig
  • Quantifying Data Contamination in Psychometric Evaluations of LLMs Jongwook Han; Woojung Song; Jonggeun Lee; Yohan Jo
  • Task-aware Block Pruning with Output Distribution Signals for Large Language Models Song-ha Jo; Youngrok Ko; Sang-goo Lee; Jinseok Seol
  • LARA: LLM-based Agile Power Distribution Network Restoration from Disastrous Events Jishnu Warrier; Heqing Huang; Yuzhang Lin; Sai Qian Zhang
  • Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Benchmark Mohammad Khodadad; Ali Shiraee Kasmaee; Mahdi Astaraki; Nicholas Sherck; Hamidreza Mahyar; Soheila Samiee
  • SD-E2: Semantic Exploration for Reasoning Under Token Budgets Kshitij Mishra; Nils Lukas; Salem Lahlou
  • Risk Assessment of Power Outages as Rare Events with Learning Models and LLMs Haiyun Huang; Yukun Li; Marco A Pretell; Jacob Naroian; Ebadah Khan; Liping Liu
  • Thinking Beyond the Local: Multi-View Instructed Adaptive Reasoning in KG-Enhanced LLMs Minghan Zhang; Shu Zhao; Zhen Yang; Hongsheng Wu; Yongxing Lin; Haodong Zou; Jie Chen; Zhen Duan
  • DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution L D M S Sai Teja; N Siva Gopala Krishna; Ufaq Khan; Muhammad Haris Khan; Partha Pakray; Atul Mishra
  • Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation Juhyun Oh; Nayeon Lee; Chani Jung; Jiho Jin; Junho Myung; Jongwon Lee; Taieui Song; Alice Oh
  • Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Junbo Li; Peng Zhou; Rui Meng; Meet P. Vadera; Lihong Li; Yang Li
  • Decoding Time Series with LLMs: A Multi-Agent Framework for Cross-Domain Annotation Minhua Lin; Zhengzhang Chen; Yanchi Liu; Xujiang Zhao; Zongyu Wu; Junxiang Wang; Xiang Zhang; Suhang Wang; Haifeng Chen
  • Multi-Hall-SA: A Cross-lingual Benchmark for Multi-Type Hallucination Detection in Low-Resource South African Languages Sello Ralethe; Jan Buys
  • ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laymen Shounak Paul; Raghav Dogra; Pawan Goyal; Saptarshi Ghosh
  • Query4Regex: Verifiable Regex Transformation through Formal Operations from NL and DSL Queries Joonghyuk Hahn; Yo-Sub Han
  • SrcMix: Mixing of Related Source Languages Benefits Extremely Low-resource Machine Translation Sanjeev Kumar; Preethi Jyothi; Pushpak Bhattacharyya
  • IMRNNs: An Efficient Method for Interpretable Dense Retrieval via Embedding Modulation Yash Saxena; Ankur Padia; Kalpa Gunaratna; Manas Gaur
  • MMUIE: Massive Multi-Domain Universal Information Extraction for Long Documents Shuyi Zhang; Zhenbin Chen; Shuting Li; Kewei Tu; Li Jing; Zixia Jia; Zilong Zheng
  • Learning to Judge: LLMs Designing and Applying Evaluation Rubrics Clemencia Siro; Pourya Aliannejadi; Mohammad Aliannejadi
  • PsyProbe: Proactive and Interpretable Dialogue through User State Modeling for Exploratory Counseling Sohhyung Park; Hyunji Kang; Sungzoon Cho; Dongil Kim
  • Learning from Child-directed Speech in Two-language Scenarios: A French-English Case-Study Liel Binyamin; Elior Sulem
  • DeVisE: Towards the Behavioral Testing of Medical Large Language Models Camila Zurdo Tagliabue; Heloisa Oss Boll; Aykut Erdem; Erkut Erdem; Iacer Calixto
  • Improving Decoder-only Language Models for Sequence Labeling through Sequence Repetition Matija Luka Kukić; Marko Čuljak; David Dukić; Martin Tutek; Jan Šnajder
  • MEENA (PersianMMMU): Multimodal-Multilingual Educational Exams for N-level Assessment Omid Ghahroodi; Arshia Hemmat; Marzia Nouri; Seyed Mohammad Hadi Hosseini; Doratossadat Dastgheib; Mohammad Vali Sanian; Alireza Sahebi; Reihaneh Zohrabi; Mohammad Hossein Rohban; Ehsaneddin Asgari; Mahdieh Soleymani Baghshah
  • Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pretrained Models Taido Purason; Pavel Chizhov; Ivan P. Yamshchikov; Mark Fishel
  • AGIC: Attention-Guided Image Captioning to Improve Caption Relevance L D M S Sai Teja; Ashok Urlana; Pruthwik Mishra
  • Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering Jieun Kim; Yujin Jeong; Sung-Bae Cho
  • FactAppeal: Identifying Epistemic Factual Appeals in News Media Guy Mor-Lan; Tamir Sheafer; Shaul R. Shenhav
  • Vietnamese Automatic Speech Recognition: A Revisit Thi Vu; Linh The Nguyen; Dat Quoc Nguyen
  • MapCoder-Lite: Distilling Multi-Agent Coding into a Single Small LLM Woongkyu Lee; Junhee Cho; Jungwook Choi
  • When Do Language Models Endorse Limitations on Human Rights Principles? Keenan Samway; Miu Nicole Takagi; Rada Mihalcea; Bernhard Schölkopf; Ilias Chalkidis; Daniel Hershcovich; Zhijing Jin
  • Abstractive Summarization of Bengali Academic Videos Based on Audio Subtitles Lamisa Bintee Mizan Deya; Farhatun Shama; Abdul Aziz; Md Kaykobad Reza; Md Shahidul Salim
  • Active Learning with Non-Uniform Costs for African Natural Language Processing Bonaventure F. P. Dossou; Ines Arous; Audrey Durand; Jackie Chi Kit Cheung