Senior AI/ML Engineer · Technical Lead · Meta AI

Hooman
Mohammadi.

Building agentic LLM systems at Meta that process 10M+ posts daily. Specializing in post-training optimization, inference at scale, and ML systems that ship to billions.

01 — About

Who I Am

Google Scholar Profile

Senior AI/ML engineer and technical lead at Meta AI, driving post-training and inference optimization for multi-modal LLM systems at massive scale. Former founder of Momento AI, and engineering alumnus of AWS, Coinbase, TikTok, and Magic Leap.

My work lives at the boundary of systems engineering and machine learning — designing agentic AI pipelines, hardening them against adversarial prompt injection and tool misuse, and squeezing maximum throughput out of GPU inference infrastructure for production workloads serving billions of users.

Pursuing an M.S. in CS (ML/AI) at Georgia Tech while shipping production systems — because the best engineers never stop learning.

7+
Years Eng.
10M+
Posts/Day
6+
Tier-1 Cos

02 — Experience

Where I've Worked

  1. 2025 — Present

    Meta AI

    Senior Software Engineer · Technical Lead

    Technical lead for agentic AI infrastructure processing 10M+ posts daily. Orchestrated post-training inference optimization pipelines spanning semantic search, vector-based document retrieval, and multi-modal image & video compression. Architected defensive layers against prompt injection, tool misuse, and data exfiltration in agentic LLM workflows. Focused on memory efficiency, latency reduction, and real-time system-level performance analysis in high-throughput GPU environments.

    Post-TrainingInference OptimizationMulti-modal LLMsAgentic AIPrompt DefenseNVIDIA TritonVector SearchPython
  2. 2023 — 2025

    Momento AI

    Founder · Technical Lead · CAIO

    Founded and scaled a hybrid multi-cloud mapping and migration platform powered by open-source GenAI on a GPT-like interface. Built the full product from zero — managed a cross-functional team of PMs, UX designers, and engineers, growing the core team to 5. Full architecture ownership across Docker, Kubernetes, and Terraform spanning AWS, GCP, and Azure with SDKs in Python, Go, and Java.

    GenAILLMMulti-CloudKubernetesDockerTerraformPythonGo
  3. 2022 — 2023

    Amazon Web Services

    SDE · Technical Lead · L5

    Led design of fraudulent detection within in-house microservices as part of AI-driven payment initiatives. Achieved 75% latency optimization for payment transaction pagination serving global customers. Mentored a new hire and returning intern on technical skills and team best practices.

    MicroservicesFraud DetectionLatency OptimizationAI/MLAWSJava
  4. 2021 — 2022

    Coinbase

    Senior Software Engineer · Protocol Engineering

    Reduced cost of customer nodes (Participation, Query & Transact) by 40–60% for on-chain validators. Researched blockchain infrastructure across Zero Knowledge (ZKML), Bitcoin forks, and Tendermint protocol.

    BlockchainZKMLTendermintGoProtocol Eng.
  5. 2020 — 2021

    TikTok

    Software Engineer R&D

    Designed a data-driven system that increased external partnership throughput 5× to 50–100K QPS. Contributor and mentor on the identity graph and personalized ads recommendation engine at global scale.

    Recommendation EngineIdentity Graph50-100K QPSAds MLC++
  6. 2019 — 2020

    Magic Leap

    Machine Learning Engineer

    Hardened backend services against OWASP Top 10 vulnerabilities. Designed a computer vision model using RCNN to detect and redact faces on Magic Leap One AR devices. Built secure shared cloud services to scale heavy spatial AR computations.

    RCNNComputer VisionAR/XRSecurityPyTorch

03 — Projects

Things I've Built

Image Query

Deep learning visual search engine using k-means clustering and visual vocabulary correlations. Built a bag-of-words representation over CNN feature embeddings to enable content-based image retrieval at scale.

Deep Learningk-MeansCNNPython

Pelpins

Personal web platform and full-stack UI project maintained as a git submodule within the portfolio codebase. Explores modular front-end architecture and component-driven design patterns.

JavaScriptCSS/SCSSHTML

WLAN

Parallel M/M/1 queue server simulation of CSMA/CA. Models wireless network behavior under variable load to analyze throughput, latency, and collision rates using queuing theory.

SimulationQueuing TheoryNetworkingC++

H Language

Custom programming language with minimal syntax and functional programming semantics — built from scratch with its own lexer, parser, and interpreter. An exploration of PL theory and compiler design.

PL TheoryInterpreterCompilerPython

Credit Loan

Kaggle ML competition — trained an ensemble predictive model using gradient boosting and logistic regression to classify credit loan risk. Feature engineering, cross-validation, and hyperparameter tuning on real financial datasets.

XGBoostSklearnFeature Eng.Kaggle

PPoker

App interface for poker admin and cashier tasks including game scenarios and chip management. Full UI for real-time game state tracking and session management for live poker games.

JavaScriptReactReal-time UI

04 — Skills

Tech Stack

LLM & AI Systems

Pre-TrainingPost-Training / RLHFInference OptimizationAgentic AIMulti-modal LLMsPrompt Injection DefenseRAG / Semantic SearchFine-tuning / PEFTLoRA / QLoRATool Misuse Defense

ML Infrastructure

NVIDIA TritonTensorRTvLLMReal-time Perf. AnalysisModel Scaling in ProdGPU Memory OptimizationLatency ProfilingPyTorchONNXSageMakerFlash Attention

Languages

PythonGoC / C++JavaJavaScriptSQL

Cloud & Infrastructure

AWSGCPKubernetesDockerTerraformKafkaRedisMicroservices

ML / Data Science

Computer VisionRCNN / CNNsNLP / TransformersRecommendation SystemsVector DatabasesXGBoostRandom ForestBrain Tumor Segmentation

Security & Reliability

Prompt Injection DefenseData Exfiltration PreventionOWASP Top 10Fraud DetectionZero Knowledge (ZKML)Blockchain Protocol

05 — Research & Advising

Beyond Engineering

📄 Publication · UW Kurtlab · 2024

Ensemble Approach for Brain Tumor Segmentation and Synthesis

Deep learning ensemble model for accurate volumetric segmentation of brain tumors using the BraTS dataset — combining multiple architectures to improve precision across tumor sub-regions.

University of Washington · BraTS · PyTorch · Medical Imaging
🏥 AI Advisor · March Health · 2024–2025

AI for Early Endometriosis Diagnosis via Digital Twins

Founding Partner and AI Advisor. Directed technical strategy for a platform using digital twins and health agents for early endometriosis diagnosis. Advised leadership on applied LLM, RL, and medical ML.

March Health · Los Angeles, CA · LLM + RL · Health Agents
🧬 Founding Engineer · 310.ai · 2020–2021

Generative AI Infrastructure for Biotech

Built generative AI systems and infrastructure for Biotech companies using Transformers, NLP, and early LLM architectures — one of the earliest production applications of large language models in life sciences.

310.ai · Transformers · NLP · LLMs · Biotech

06 — Contact

Let's Build
Something
Remarkable.

Open to senior engineering and staff roles, AI infrastructure opportunities, research collaborations, and advisory conversations. Bay Area based — let's connect.