Data Pragmatist
Posts
Machine Learning methods

Machine Learning methods

Machine Learning is a subset of AI that focuses on enabling machines to improve with experience using statistical methods.

March 08, 2024

Sponsored by

Welcome to learning edition of the Data Pragmatist, your dose of all things data science and AI.

📖 Estimated Reading Time: 5 minutes. Missed our previous editions?

Today we delve into Machine Learning methods. As part of our learning series, Top AI Tools for Productivity.

Do follow us on Linkedin and Twitter for more real-time updates.

— Arun Chinnachamy

What you need for better GenAI applications

Learn more from Pinecone Research on how Retrieval Augmented Generation (RAG) using Pinecone serverless increases relevant answers from GPT-4 by 50%.

- The larger the search data, the more “faithful” the results.

- RAG with massive on-demand data outperforms GPT-4 (without RAG) by 50%, even with the data it was explicitly trained on.

- RAG, with a lot of data, ensures state of the art performance no matter the LLM you choose, encouraging the use of different LLMs (e.g., open-source or private LLMs).

Data Science/Analysis Job Roles to Pursue Right Now

Data Mining Expert at Plant-A Insights Group
Data Engineer & Web Crawler at hush.ai
Quality Analyst (FinCrime) at Revolut
Data Collector - WFH - No experience needed at TransPerfect
Data Science - Technical Lead at One97 Communications Limited

🧠 Machine Learning methods

Machine Learning is a subset of AI that focuses on enabling machines to improve with experience using statistical methods. It involves designing algorithms that can learn and enhance over time by observing new data. The goal is to derive meaning from data, making it the key to unlocking Machine Learning. With more qualified data, Machine Learning algorithms become more accurate in making decisions and carrying out tasks autonomously.

Machine Learning methods

Even though there are a number of approaches are used in Machine Learning, the most popular ones are as follows:

Supervised Learning
Supervised Learning uses labeled data to train machines, allowing them to learn from past experiences. With a larger dataset, machines can gain more insight into the subject. Once trained, machines can predict outcomes when given new, unseen data. This method is often applied in scenarios where historical data predicts future events, such as detecting fraudulent credit card transactions or assessing insurance claim likelihood.
An example for Supervised Learning based on topics, picture by Google Developer Tutorial
Unsupervised Learning
Unsupervised Learning involves training machines without labeled data. Similar to listening to a foreign language podcast without understanding, the algorithms learn from the data's inherent structure without explicit guidance. While one podcast may not provide much insight, exposure to numerous podcasts enables the brain to form a language model, recognize patterns, and anticipate certain sounds. Techniques like self-organizing maps, nearest-neighbor mapping, and k-means clustering are utilized to explore the data and identify underlying structures.
An example of clustering in k-means algorithm, picture by Google Developer Tutorial
K-means clustering is one of the most popular unsupervised machine learning algorithms. that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean that is called cluster centers or cluster centroid.
Reinforcement Learning
Reinforcement Learning, akin to unsupervised learning, lacks labeled data but receives graded outcomes based on actions taken. Through iterative processes, such as playing numerous games, the system learns to optimize rewards, eventually devising winning strategies. The objective is to learn the best policy for maximizing expected rewards over time. Reinforcement Learning finds applications in robotics, gaming, and navigation.
An example of Reinforcement Learning in dog training
Semi-supervised Learning
Semi-supervised Learning combines labeled and unlabeled data for training, suitable for applications like Supervised Learning but with reduced labeling costs. It utilizes a small amount of labeled data alongside a larger pool of unlabeled data, which is more cost-effective and easier to acquire. This approach can be applied to classification, regression, and prediction tasks. An example includes identifying a person's face on a webcam.

🤔 OpenAI fires back, says Elon Musk demanded 'absolute control' of the company LINK

OpenAI has countered Elon Musk's lawsuit by revealing Musk's desire for "absolute control" over the company, including merging it with Tesla, holding majority equity, and becoming CEO.
In a blog post, OpenAI aims to dismiss Musk's claims and argues against his view that the company has deviated from its original nonprofit mission and has become too closely aligned with Microsoft.
OpenAI defends its stance on not open-sourcing its work, citing a 2016 email exchange with Musk that supports a less open approach as the development of artificial general intelligence advances.

📱 iOS 17.4 is here: what you need to know LINK

Apple releases iOS 17.4 with EU-mandated alternative app stores and third-party payments, podcast transcripts, and new emojis.
Features include searchable podcast transcripts in the Apple Podcasts app and new emojis like mushrooms and phoenixes, with customizable direction for people emojis.
Updates add music recognition for Apple Music integration, multilingual Siri message announcements, enhanced security features, and detailed battery health for new iPhones.

How did you like today's email?

❤️ Loved it | 💪 Pretty good | 💢 Could do better

If you are interested in contributing to the newsletter, respond to this email. We are looking for contributions from you — our readers to keep the community alive and going.