Sultan Alrashed

Research engineer at KAUST (in Prof. Francesco Orabona's lab) working on pretraining bilingual Arabic-English LLMs. I've spent the last few years getting my hands dirty with large-scale distributed training! Most recent things I've done are:

Released SOTA Arabic, Turkish, and Hindi pretraining datasets (AraMix, TurMix, and HinMix)
Set a world record in the modded-nanogpt speedrunning leaderboard
Got a paper into EACL 2026!

Previously at SDAIA as a founding member of the ALLaM team, where I helped build Saudi Arabia's flagship Arabic language model.

linkedin.com/in/sulrash • github.com/sulrash • huggingface.co/SultanR • x.com/srashedll

Experience

Research Engineer Jan. 2025 — Current

King Abdullah University of Science & Technology (KAUST) Thuwal, SA

Research engineer in the OPTIMAL lab under Prof. Francesco Orabona. Leading development of a bilingual Arabic-English LLM at the 3B parameter and 5T token scale.

Artificial Intelligence Engineer Jan. 2023 — Jan. 2025

Saudi Data & Artificial Intelligence Authority (SDAIA) Riyadh, SA

Founding member of the ALLaM team, worked across the full LLM pipeline because of our initial understaffing.

Research Engineering Fellowship Apr. 2024 — Oct. 2024

KAUST & SDAIA Partnership Thuwal, SA

Selected for a fellowship program where I focused on AI for education. Built an AI-based learning management system for the Ministry of Education that I presented to the Minister at GAIN 2024. System started piloting in public schools.

Publications & Preprints

KAUST Rising Stars in AI Symposium 2026 We’re Scraping the Bottom: Curation as a Path Forward For Training LLMs Sultan Alrashed, Chadi Helwe, Francesco Orabona

A poster session that summarizes findings from "Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets" and "SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data" into a cohesive narrative on the redundancies produced by scraping.

Arxiv 2026 Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Sultan Alrashed, Francesco Orabona

We show that redundancy across web pretraining corpora is itself a quality signal: documents retained by multiple independent pipelines are more likely to be high-quality. Our method outperforms the best baselines by 4-11% on Arabic, Turkish, and Hindi and beats model-based approaches. All datasets are released openly.

EACL Findings 2026 Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning Sultan Alrashed, Jianghui Wang, Francesco Orabona

A contamination-resistant English/Arabic text-game suite with scalable difficulty.

AISTATS 2026 | CPAL 2026 Spotlight Track Beyond the Ideal: Analyzing the Inexact Muon Update Egor Shulgin, Sultan Alrashed, Francesco Orabona, Peter Richtárik

Analysis of Muon's inexact orthogonalized update with performance bounds under an additive-error LMO framework.

Arxiv 2025 SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Sultan Alrashed, Chadi Helwe, Francesco Orabona

Introduces the largest multi-turn, tool-calling, reasoning-inclusive Arabic post-training dataset, at around 2B tokens.

Arxiv 2025 Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Tyler A. Chang et al. (335 authors incl. Sultan Alrashed)

Wrote the Arabic Saudi dialect portion of a multilingual version of the PIQA benchmark.

ICML 2025 World Models Workshop ReviseQA: A Benchmark for Belief Revision in Multi-Turn Logical Reasoning Chadi Helwe, Sultan Alrashed, Francesco Orabona

Benchmark testing logical consistency under iterative context updates.

Arxiv 2024 Fineweb-Edu-Ar: Machine-translated Corpus to Support Arabic Small Language Models Sultan Alrashed, Dmitrii Khizbullin, David R. Pugh

Introduces a machine-translated Arabic corpus derived from FineWeb-Edu (202B tokens) for training and evaluating Arabic language models, the largest at the time.

Arxiv 2024 SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs Sultan Alrashed

Shows that higher LR:batch-size ratios can boost reasoning in small LMs. Achieved the highest IFEval score of any sub-3B model at release.

ICLR 2025 ALLaM: A Series of Large Language Models for Arabic and English M Saiful Bari, Yazeed Alnumay, Norah A. Alzahrani, Nouf M. Alotaibi, Hisham A. Alyahya, Sultan Alrashed, et al.

SDAIA's bilingual Arabic/English pretrained LLM series. Initially was on pretraining, later focused on finetuning and alignment as the team scaled.

ACL Main 2024 When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards Norah Alzahrani, Hisham A. Alyahya, Yazeed Alnumay, Sultan Alrashed, Shaykhah Alsubaie, Yusef Almushaykeh, Faisal Mirza, Nouf Alotaibi, Nora Altwairesh, Areeb Alowisheq, M Saiful Bari, Haidar Khan

A deep dive into the sensitivity of LLM benchmarks and evaluations to minor structural perturbations, many results contributed to LM-Harness.

Projects & Volunteering

PyTorch Captum Contributions Improved support for LLMs in PyTorch's Captum repository, adding support for a larger range of models and tasks.

Megatron-Deepspeed Contributions Fixed backwards compatibility bug.

Lighteval Contributions Added quantization support for vLLM models.

Nanotron Contributions Fixed bugs to get pretraining example in docs to work.

Next-Token Agent A project focused on pretraining and finetuning tiny language models to solve ASCII games by predicting each successive frame, able to perfectly solve mazes from (frozenlake).

Environment Encoder A proposal and implementation of an idea to train a reinforcement learning agent to play games given a vision-language model's embeddings, instead of frames.

Reinforcement Learning Roguelike Solver For my university honours project, I wrote PPO, DQN, and A2C agents to compare effects of perfect and imperfect information on my CLI game.

Cheatsheet A never finished project that implements a new augmentation and objective for learning classification for computer vision models.