MLOps

机器学习运维——训练、推理、评估、GPU云平台

Modal Serverless GPU

Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.

#Infrastructure#Serverless#GPU

lm-evaluation-harness - LLM Benchmarking

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

#Evaluation#LM Evaluation Harness#Benchmarking

Weights & Biases: ML Experiment Tracking & MLOps

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps, and manage model registry with W&B - collaborative MLOps platform

#MLOps#Weights And Biases#WandB

Hugging Face CLI (`hf`) Reference Guide

Hugging Face Hub CLI (hf) — search, download, and upload models and datasets, manage repos, query datasets with SQL, deploy inference endpoints, manage Spaces and buckets.

#huggingface#hf#models

GGUF - Quantization Format for llama.cpp

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

#GGUF#Quantization#llama.cpp

Guidance: Constrained LLM Generation

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained generation framework

#Prompt Engineering#Guidance#Constrained Generation

llama.cpp + GGUF

llama.cpp local GGUF inference + HF Hub model discovery.

#llama.cpp#GGUF#Quantization

OBLITERATUS Skill

Remove refusal behaviors from open-weight LLMs using OBLITERATUS — mechanistic interpretability techniques (diff-in-means, SVD, whitened SVD, LEACE, SAE decomposition, etc.) to excise guardrails while preserving reasoning. 9 CLI methods, 28 analysis modules, 116 model presets across 5 compute tiers, tournament evaluation, and telemetry-driven recommendations. Use when a user wants to uncensor, abliterate, or remove refusal from an LLM.

#Abliteration#Uncensoring#Refusal-Removal

Outlines: Structured Text Generation

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library

#Prompt Engineering#Outlines#Structured Generation

vLLM - High-Performance LLM Serving

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

#vLLM#Inference Serving#PagedAttention

AudioCraft: Audio Generation

PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation.

#Multimodal#Audio Generation#Text-to-Music

CLIP - Contrastive Language-Image Pre-Training

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

#Multimodal#CLIP#Vision-Language

Segment Anything Model (SAM)

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

#Multimodal#Image Segmentation#Computer Vision

Stable Diffusion Image Generation

State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.

#Image Generation#Stable Diffusion#Diffusers

Whisper - Robust Speech Recognition

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

#Whisper#Speech Recognition#ASR

DSPy: Declarative Language Model Programming

Build complex AI systems with declarative programming, optimize prompts automatically, create modular RAG systems and agents with DSPy - Stanford NLP's framework for systematic LM programming

#Prompt Engineering#DSPy#Declarative Programming

Axolotl Skill

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

#Fine-Tuning#Axolotl#LLM

GRPO/RL Training with TRL

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

#Post-Training#Reinforcement Learning#GRPO

PEFT (Parameter-Efficient Fine-Tuning)

Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.

#Fine-Tuning#PEFT#LoRA

Pytorch-Fsdp Skill

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2

#Distributed Training#PyTorch#FSDP

TRL - Transformer Reinforcement Learning

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

#Post-Training#TRL#Reinforcement Learning

Unsloth Skill

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

#Fine-Tuning#Unsloth#Fast Training