BeClaude
Research2026-05-05

NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

Source: Arxiv CS.AI

arXiv:2605.00086v1 Announce Type: cross Abstract: High-quality corpora are essential for advancing Natural Language Processing (NLP) in Portuguese. Building on previous encoder-only models such as BERTimbau and Albertina PT-BR, we introduce NorBERTo, a modern encoder based on the ModernBERT...

arxivpapers