Data Pragmatist
Posts
A Beginner's Guide to Quantization in Deep Learning with PyTorch

A Beginner's Guide to Quantization in Deep Learning with PyTorch

Apple could face a $38B fine

June 26, 2024

Welcome to learning edition of the Data Pragmatist, your dose of all things data science and AI.

📖 Estimated Reading Time: 5 minutes. Missed our previous editions?

💰 Apple could face a $38B fine LINK

The European Union has charged Apple with violating the Digital Markets Act due to its App Store policies that hinder competition, marking Apple as the first company under these new regulations.
Apple faces fines up to 10 percent of its annual global revenue, or $38 billion, if found guilty, with potential penalties increasing to 20 percent for repeat offenses.
The European Commission is also investigating Apple for its support of alternative iOS app stores, focusing on the contentious Core Technology Fee and the complex process required for installing third-party marketplaces.

🤝 Apple in talks with Meta for potential AI integration LINK

Apple is reportedly negotiating with Meta to integrate Meta’s generative AI model into Apple's new AI system, Apple Intelligence, according to The Wall Street Journal.
Apple is seeking partnerships with multiple AI companies, including Meta, to enhance its AI capabilities and catch up in the competitive AI race.
A potential collaboration between Apple and Meta would be significant due to their history of disagreements, and it could greatly impact the AI industry if successful.

🧠 A Beginner's Guide to Quantization in Deep Learning with PyTorch

Highlights

Understanding Quantization: Learn what quantization is and why it's necessary.
Mathematical Derivations: Dive into the mathematical aspects of quantization.
Coding in PyTorch: Perform quantization and de-quantization of LLM weight parameters in PyTorch.

What is Quantization and Why Do You Need It?

Quantization compresses large models by reducing the precision of weight parameters and activations, significantly decreasing model size. For instance, the Llama 3 8B model reduces from 32GB to 8GB with INT8 quantization, and further to 4GB with INT4 quantization. This enables model fine-tuning and inference on devices with limited memory and processing power, reducing the need for expensive cloud resources while maintaining accuracy.

How Does Quantization Work?

Quantization maps higher precision weights (e.g., FP32) to lower precision (e.g., INT8) using linear quantization methods. There are two modes:

Asymmetric Quantization: Maps original tensor values to a quantized range, using a scale value (S) and a zero point (Z).
Symmetric Quantization: Maps the zero point of the original tensor range directly to zero in the quantized range, eliminating the need for a separate zero point.

Asymmetric Quantization: Mathematical Derivation

Symmetric Quantization

Symmetric quantization simplifies calculations by mapping zero to zero directly, using the same principles without needing a zero point.

Coding in PyTorch

Initialize Weights: Create a random tensor.
Define Functions: Write functions for asymmetric quantization and de-quantization.
Perform Quantization: Calculate quantized weights, scale, and zero point.
De-quantize: Recover original weights from quantized values.
Evaluate Accuracy: Compute quantization error to ensure accuracy.

import torch

def asymmetric_quantization(original_weight):
  quantized_data_type = torch.int8
  Wmax, Wmin = original_weight.max().item(), original_weight.min().item()
  Qmax, Qmin = torch.iinfo(quantized_data_type).max, torch.iinfo(quantized_data_type).min
  S = (Wmax - Wmin) / (Qmax - Qmin)
  Z = Qmin - (Wmin / S)
  Z = int(round(max(min(Z, Qmax), Qmin)))
  quantized_weight = torch.clamp(torch.round((original_weight / S) + Z), Qmin, Qmax).to(quantized_data_type)
  return quantized_weight, S, Z

def asymmetric_dequantization(quantized_weight, scale, zero_point):
  return scale * (quantized_weight.to(torch.float32) - zero_point)

# Example
original_weight = torch.randn((4,4))
quantized_weight, scale, zero_point = asymmetric_quantization(original_weight)
dequantized_weight = asymmetric_dequantization(quantized_weight, scale, zero_point)
quantization_error = (dequantized_weight - original_weight).square().mean()

Conclusion

This guide covers essential quantization concepts and methods, providing a strong foundation for implementing quantization in LLMs and other deep learning models. For advanced techniques like channel and group quantization, stay tuned for future posts.

A Beginner's Guide to Quantization in Deep Learning with PyTorch

Apple could face a $38B fine

🧠 A Beginner's Guide to Quantization in Deep Learning with PyTorch

Highlights

What is Quantization and Why Do You Need It?

How Does Quantization Work?

Asymmetric Quantization: Mathematical Derivation

Symmetric Quantization

Coding in PyTorch

Conclusion

Top 3 AI Tools for Video Editing

1. Adobe Premiere Pro

2. Wondershare Filmora

3. Runway

How did you like today's email?