BeClaude

awq-quantization

New
19.9kSmitheryGeneralby davila7

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

First seen 5/29/2026

Install & Usage

1
Open your MCP config
~/.claude.json
2
Add the server config

Add the configuration to "mcpServers": { "awq-quantization": { "command": "...", "args": [] } }

3
Restart Claude Code
/mcp
View source on GitHub
deploymentai-&-mldata-&-analytics

Security Audits

LicenseUnknownSourceWarnRepositoryPass

Frequently Asked Questions

What is awq-quantization?

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

How to install awq-quantization?

To install awq-quantization: open your mcp config (~/.claude.json), then add the config to "mcpServers": { "awq-quantization": { "command": "...", "args": [] } }. Finally, /mcp in Claude Code.

What is awq-quantization best for?

awq-quantization is a mcp categorized under General. It is designed for: deployment, ai-&-ml, data-&-analytics. Created by davila7.