Data Pragmatist
Posts
Fairness Metrics in Machine Learning

Fairness Metrics in Machine Learning

Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else

February 05, 2025

In partnership with

Welcome to learning edition of the Data Pragmatist, your dose of all things data science and AI.

📖 Estimated Reading Time: 5 minutes. Missed our previous editions?

🚀 Meta's Next Llama AI Models Are Training on a GPU Cluster 'Bigger Than Anything' Else. Link

Meta is developing its next-generation Llama 4 AI model, using a massive infrastructure of over 100,000 Nvidia H100 GPUs to significantly enhance processing power.
The Llama models continue to be open-source, which sets Meta apart from other companies.
The immense computational resources will enable Meta to push the boundaries of AI capabilities, particularly in natural language processing, making it possible to handle complex tasks more efficiently.
CEO Mark Zuckerberg emphasized that Meta’s commitment to open-source AI development is crucial for encouraging innovation and collaboration in the AI community.

📈 AI Improvements Are Slowing Down. Companies Have a Plan to Break Through the Wall. Link

AI development has encountered a slowdown as scaling laws reach their limits, meaning adding more computing power no longer results in linear improvements in performance.
Companies are looking for alternatives like multimodal data integration (combining text, images, and videos) to enhance the AI’s understanding and outputs, as well as using high-quality private datasets for better model training.
Synthetic data generation is gaining traction, allowing companies to create realistic training data where real-world data is unavailable or too costly, making AI models more robust.
There is an increasing focus on improving AI models' reasoning abilities rather than just increasing their scale.

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

🧠 Fairness Metrics in Machine Learning

Fairness metrics help ensure that machine learning models do not discriminate against certain groups based on sensitive attributes such as race, gender, or age. Two commonly used fairness metrics are Demographic Parity and Equalized Odds.

1. Demographic Parity

Definition:
Demographic Parity (also called Statistical Parity) requires that the proportion of positive outcomes (e.g., loan approval, hiring) be the same across different demographic groups.

Example:
If a loan approval model approves loans for 70% of applicants in one group (e.g., men), it should approve loans for 70% of applicants in another group (e.g., women).

Limitations:

It does not consider differences in qualifications or actual risk factors.
It can be unfair if groups have different underlying distributions of relevant features.

2. Equalized Odds

Definition:
Equalized Odds requires that a model's predictions be equally accurate across all demographic groups. Specifically, the true positive rate (TPR) and false positive rate (FPR) should be the same for all groups.

Example:
If a medical diagnostic model predicts cancer, it should have similar accuracy for all racial groups—ensuring no group is unfairly over- or under-diagnosed.

Limitations:

It may be difficult to achieve when base rates differ across groups.
Ensuring equalized odds may reduce overall accuracy.

Conclusion

Both Demographic Parity and Equalized Odds offer ways to assess fairness, but each has trade-offs. The best metric depends on the specific application and societal context.

AI Tools for Early Disease Detection and Diagnosis

1. IBM Watson Health

Overview: IBM Watson Health uses AI to analyze medical data for early disease detection and personalized treatment recommendations.
Key Features:
- Watson for Oncology aids in early cancer detection.
- Analyzes both structured and unstructured clinical data.
- Provides personalized treatment recommendations.
Use Cases: Cancer detection, particularly breast, lung, and colon cancers.

2. PathAI

Overview: PathAI uses machine learning to assist in pathology, especially for early detection of cancer.
Key Features:
- AI analyzes pathology slides to detect cancerous cells.
- Improves diagnostic accuracy and speed.
Use Cases: Early detection of cancers like breast, lung, and prostate cancer.

3. Aidoc

Overview: Aidoc offers AI-powered tools for radiology, helping in the early detection of acute conditions from medical images.
Key Features:
- AI detects conditions like brain bleeds and strokes in CT scans.
- Provides real-time analysis to assist radiologists.
Use Cases: Early detection of brain hemorrhages, strokes, and other urgent conditions.

4. Google Health (AI for Retinopathy Screening)

Overview: Google Health uses AI to detect diabetic retinopathy, a major cause of blindness, from retinal images.
Key Features:
- AI analyzes retinal images for early signs of diabetic retinopathy.
- Provides automated, accurate screening.
Use Cases: Diabetic retinopathy detection and early eye disease diagnosis.

5. Tempus

Overview: Tempus combines AI with genomic data to provide early cancer detection and personalized treatment plans.
Key Features:
- AI analyzes molecular data to identify early cancer markers.
- Provides personalized treatment based on genetic data.
Use Cases: Early cancer detection and precision oncology treatment planning.

If you are interested in contributing to the newsletter, respond to this email. We are looking for contributions from you — our readers to keep the community alive and going.