2025 - Alina Wróblewska - Integrating gender inclusivity into large language models

youtube.com 2 miesięcy temu

Imagine a language with masculine, feminine, and neuter grammatical genders, yet, due to historical and political conventions, masculine forms are predominantly used to refer to men, women and mixed-gender groups. This is the reality of contemporary Polish. A social consequence of this unfair linguistic system is that large language models (LLMs) trained on Polish texts inherit and reinforce this masculine bias, generating gender-imbalanced outputs. This study addresses this issue by tuning LLMs using the IPIS dataset, a collection of human-crafted gender-inclusive proofreading in Polish and Polish-to-English translation instructions. Grounded in a theoretical linguistic framework, we design a system prompt with explicit gender-inclusive guidelines for Polish. In our experiments, we IPIS-tune multilingual LLMs (Llama-8B, Mistral-7B and Mistral-Nemo) and Polish-specific LLMs (Bielik and PLLuM). Our approach aims to integrate gender inclusivity as an inherent feature of these models, offering a systematic solution to mitigate gender bias in Polish language generation.

2025 - Alina Wróblewska - Integrating gender inclusivity into large language models

Powiązane

GeeCON 2025: Patrik Duditš - From Clutter to Clusters: Vecto...

GeeCON 2025: Pavel Lahoda - Mobile-First Security: A Tale of...

GeeCON 2025: Josef Goldstein - The Future of Data is Words

Unmasking social debt: Silent threat to your team's success ...

Event-Driven Applications vs. Durable Execution in Practice ...

Java on AWS - all the details that substance • Sebastian Gę...

Toruń JUG #85 - "ArchUnit - Jak testy architektury oszczędza...

Toruń JUG #85 - "Generatywne AI w rękach developera" - Macie...

IntelliJ thought Tips & Tricks • Marit van Dijk • Devoxx Po...