Benchmarking and Improving Large Language Model Serving Performance
Bob Chen
Large Language Model (LLM) serving frameworks like vLLM must deliver high throughput and low latency to justify the substantial cost of GPU and TPU...
Integrating VBLoRA into Keras
Dhiraj BM
VB-LoRA (Vector Bank Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that extends LoRA (Low-Rank Adaptation) by introducing a...
Implement open LLM models with JAX and Flax
Megan Andrews
As interest in large language models continues to grow, many open-source implementations exist across various frameworks—but accessible, modular, and...
"Pomni" - JAX Code Assistant
neel04
Current Frontier LLMs are really bad at assisting with writing JAX code due to the sparse resources spread across multiple different forums and...