Aaditya Paudel - mathematics, AI systems, benchmarks

Instructor of Mathematics and founder/creator of Neohm Labs.

I build mathematical reasoning systems, benchmark datasets, classroom AI interfaces, and reproducible research artifacts. My current work asks how much better fixed-weight models become when the runtime around them is engineered carefully: what they read, how memory is certified, how verification roles apply pressure, and how final answers are emitted as auditable objects.

Role: Instructor of Mathematics Affiliation: Miami University Studio: Neohm Labs

Professional snapshot

This page is the canonical public profile for research collaboration, portfolio review, homepage requests, and professional context around Neohm Labs.

Role

Instructor of Mathematics

Miami University, Department of Mathematical and Physical Sciences. Teaching mathematics while building classroom-first AI tools and public mathematical reasoning artifacts.

Founder

Neohm Labs

Creator and founder of Neohm Labs, the public studio behind AthenaV5, AEN, Vault of Echoes, Canon DSL, and related reasoning infrastructure.

Focus

What I specialize in

Mathematical reasoning systems; exact-answer evaluation; long-context model serving; multi-role solver/verifier protocols; Runtime-at-Boot and certified context loading; benchmark curation; synthetic mathematical data; and classroom AI interfaces that make reasoning work inspectable.

AEN AthenaV5 VoE Canon DSL AIMO/Kaggle runtime engineering Qwen-family long context Controller algorithms

Publications, datasets, and public records

Public work is listed with artifact boundaries where relevant. AEN results distinguish blind diagnostics from answer-aware replay and context-recall evidence.

Paper

Artificial Evaluation Network (AEN): Runtime-at-Boot, Certified Context Loading, and Triadic Controller Algorithms for Mathematical Reasoning

Preprint introducing AEN as a runtime architecture for exact-answer mathematical reasoning with fixed-weight language models. Covers Runtime-at-Boot certification, role-specific memory, the Athena-Aria-Artemis triad, controller-owned finalization, Canon v2.1 distillation, and artifact-based evaluation.

Paper

Canon DSL v2.1

Metadata-First Distillation for Synthetic Mathematical Data. A YAML-style schema for converting solved mathematical problems into structured records with objects, givens, asks, invariants, theorem roles, answer normalization, and generation lineage.

Dataset

Vault of Echoes 2026

Public-answer 25-problem mathematical reasoning benchmark with parquet/csv data, public key, scorer, sample submission, checksum ledger, and benchmark use policy. Hugging Face DOI: 10.57967/hf/8554.

Record

UI-Native One-Shot Benchmarking for Mathematical Reasoning in Chat-Based LLM Systems

A Zenodo benchmark record associated with direct UI-native mathematical reasoning comparisons and the Vault of Echoes evaluation lineage. Coauthored with P. Acharya.

Book

Vault of Echoes: Volume I

Lore-infused puzzle codex and source archive. Public source archive DOI: 10.5281/zenodo.18207613. The book hub on this site also hosts the PDF route.

Systems and artifacts

The portfolio is not one demo. It is a connected set of research systems, datasets, papers, notebooks, and public deployment surfaces.

AEN

Artificial Evaluation Network

Triadic solver/verifier/agent protocol, Runtime-at-Boot, and controller-owned exact-answer finalization.

A5

AthenaV5

Live teaching and reasoning portal at portal.neohmlabs.com/AEN5.

VoE

Vault of Echoes

Puzzle codex and public benchmark family for lore-heavy exact-answer reasoning.

DSL

Canon DSL

Metadata-first mathematical data distillation and synthetic problem generation schema.

RAB

RuntimeAtBoot

Kaggle dataset and boot-memory/certification surface supporting AEN experiments.

2LLM

Two LLM Model Evaluator

Public two-body solver/verifier evaluator lineage that preceded the full AEN triad.

Selected evidence highlights

The most important habit in this work is labeling what a result is. A blind benchmark, an answer-aware replay, and a context-recall diagnostic are different scientific objects.

AIME/AEN artifact ledger

  • 15/30 frozen compressed AIME 2026 diagnostic.
  • 22/30 unrestricted one-loop AIME 2026 reference.
  • 21/30 April 27 compact benchmarkgrade result, the strongest efficiency artifact.
  • 29/30 V34 answer-aware repair replay; useful for boot-memory recall and runtime-preservation evidence, not a blind benchmark score.

VoE and public-key policy

VoE-2026 is a public-answer dataset. That makes it valuable for reproducible scoring and independent verification, but post-exposure scores should disclose that exposure and should not be presented as held-out benchmark performance.

Public answer key Scorer Manifest Exposure disclosure
AEN five-run scoreboard
AEN five-run result grid

Contact and links

For research collaboration, classroom AI pilots, benchmark work, artifact review, or Neohm Labs partnerships, email me directly.

Canonical profile routes

/aadityapaudel/
/aadityapaudel/CV.md
/aadityapaudel/profile.json
gravatar.com/aadityapaudel