Sultan Alrashed

Sultan Alrashed

Research engineer at KAUST (in Prof. Francesco Orabona's lab) working on pretraining bilingual Arabic-English LLMs. I've spent the last few years getting my hands dirty with large-scale distributed training! Most recent things I've done are:

Previously at SDAIA as a founding member of the ALLaM team, where I helped build Saudi Arabia's flagship Arabic language model.

Experience

Research Engineer
King Abdullah University of Science & Technology (KAUST) Thuwal, SA

Research engineer in the OPTIMAL lab under Prof. Francesco Orabona. Leading development of a bilingual Arabic-English LLM at the 3B parameter and 5T token scale.

Artificial Intelligence Engineer
Saudi Data & Artificial Intelligence Authority (SDAIA) Riyadh, SA

Founding member of the ALLaM team, worked across the full LLM pipeline because of our initial understaffing.

Research Engineering Fellowship
KAUST & SDAIA Partnership Thuwal, SA

Selected for a fellowship program where I focused on AI for education. Built an AI-based learning management system for the Ministry of Education that I presented to the Minister at GAIN 2024. System started piloting in public schools.

Publications & Preprints

KAUST Rising Stars in AI Symposium 2026 We’re Scraping the Bottom: Curation as a Path Forward For Training LLMs Sultan Alrashed, Chadi Helwe, Francesco Orabona
A poster session that summarizes findings from "Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets" and "SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data" into a cohesive narrative on the redundancies produced by scraping.
Arxiv 2026 Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets Sultan Alrashed, Francesco Orabona
We show that redundancy across web pretraining corpora is itself a quality signal: documents retained by multiple independent pipelines are more likely to be high-quality. Our method outperforms the best baselines by 4-11% on Arabic, Turkish, and Hindi and beats model-based approaches. All datasets are released openly.
EACL Findings 2026 Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning Sultan Alrashed, Jianghui Wang, Francesco Orabona
A contamination-resistant English/Arabic text-game suite with scalable difficulty.
AISTATS 2026 | CPAL 2026 Spotlight Track Beyond the Ideal: Analyzing the Inexact Muon Update Egor Shulgin, Sultan Alrashed, Francesco Orabona, Peter Richtárik
Analysis of Muon's inexact orthogonalized update with performance bounds under an additive-error LMO framework.
Arxiv 2025 SmolKalam: Ensemble Quality-Filtered Translation at Scale for High Quality Arabic Post-Training Data Sultan Alrashed, Chadi Helwe, Francesco Orabona
Introduces the largest multi-turn, tool-calling, reasoning-inclusive Arabic post-training dataset, at around 2B tokens.
Arxiv 2025 Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Tyler A. Chang et al. (335 authors incl. Sultan Alrashed)
Wrote the Arabic Saudi dialect portion of a multilingual version of the PIQA benchmark.
ICML 2025 World Models Workshop ReviseQA: A Benchmark for Belief Revision in Multi-Turn Logical Reasoning Chadi Helwe, Sultan Alrashed, Francesco Orabona
Benchmark testing logical consistency under iterative context updates.
Arxiv 2024 Fineweb-Edu-Ar: Machine-translated Corpus to Support Arabic Small Language Models Sultan Alrashed, Dmitrii Khizbullin, David R. Pugh
Introduces a machine-translated Arabic corpus derived from FineWeb-Edu (202B tokens) for training and evaluating Arabic language models, the largest at the time.
Arxiv 2024 SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs Sultan Alrashed
Shows that higher LR:batch-size ratios can boost reasoning in small LMs. Achieved the highest IFEval score of any sub-3B model at release.
ICLR 2025 ALLaM: A Series of Large Language Models for Arabic and English M Saiful Bari, Yazeed Alnumay, Norah A. Alzahrani, Nouf M. Alotaibi, Hisham A. Alyahya, Sultan Alrashed, et al.
SDAIA's bilingual Arabic/English pretrained LLM series. Initially was on pretraining, later focused on finetuning and alignment as the team scaled.
ACL Main 2024 When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards Norah Alzahrani, Hisham A. Alyahya, Yazeed Alnumay, Sultan Alrashed, Shaykhah Alsubaie, Yusef Almushaykeh, Faisal Mirza, Nouf Alotaibi, Nora Altwairesh, Areeb Alowisheq, M Saiful Bari, Haidar Khan
A deep dive into the sensitivity of LLM benchmarks and evaluations to minor structural perturbations, many results contributed to LM-Harness.

Projects & Volunteering

PyTorch Captum Contributions Improved support for LLMs in PyTorch's Captum repository, adding support for a larger range of models and tasks.
Megatron-Deepspeed Contributions Fixed backwards compatibility bug.
Lighteval Contributions Added quantization support for vLLM models.
Nanotron Contributions Fixed bugs to get pretraining example in docs to work.
Next-Token Agent A project focused on pretraining and finetuning tiny language models to solve ASCII games by predicting each successive frame, able to perfectly solve mazes from (frozenlake).
Environment Encoder A proposal and implementation of an idea to train a reinforcement learning agent to play games given a vision-language model's embeddings, instead of frames.
Reinforcement Learning Roguelike Solver For my university honours project, I wrote PPO, DQN, and A2C agents to compare effects of perfect and imperfect information on my CLI game.
Cheatsheet A never finished project that implements a new augmentation and objective for learning classification for computer vision models.