Data Pragmatist
Posts
Digest #1 - Four Vs of Big data, State of Children, Question Framework for Analysts

Digest #1 - Four Vs of Big data, State of Children, Question Framework for Analysts

Explainable AI, Four Vs and Framework for data questions

Arun Chinnachamy
August 02, 2023

Concept of the day

Explainable AI (XAI) is a critical facet of artificial intelligence that aims to demystify the decision-making processes of complex algorithms. As AI models become increasingly sophisticated, their inner workings often become opaque "black boxes." XAI methods seek to shed light on this black box, making AI more interpretable and transparent. By providing human-readable explanations for AI predictions and actions, XAI helps users understand the reasoning behind outcomes and builds trust in AI systems. This becomes especially crucial in high-stakes domains like healthcare, finance, and autonomous vehicles, where understanding the rationale behind AI-driven decisions is essential for accountability and ethical considerations.

Four Vs of Big data

Big data is often associated with only Volume but there are other three Vs. The four V's: Volume, Velocity, Variety, and Veracity.

Volume: The sheer amount of data at your disposal.

Velocity: The speed at which data comes flooding in, requiring real-time processing and streaming capabilities.

Variety: The diverse range of data types, like CSVs, PDFs, JSON logfiles, and more, that you need to analyze and join together for meaningful insights.

Veracity: The credibility and reliability of your data, which can be tricky, especially with IoT data and human-generated entries.

Data visualised: State of world children

Recently I was exploring the impact of COVID on children across the world and came across the report by UNESCO. After analysis, few interesting data trends. Few of them below,

For every $ spent on vaccine, it delivers a return of investment of $27.
4.4M lives are saved every year through vaccination.
About 20% of the children are under vaccinated leaving them vulnerable to preventable diseases.
The vaccination percentage dropped during the pandemic years due to multiple reasons.

Using data to visualise and understand the world around us is fascinating and also very rewarding.

Understanding Questions - Framework for Analyst

As an analyst, you'll encounter various questions from different teams like sales, product, marketing, operations, strategy, or engineering. These questions can be categorized into six types:

Descriptive: These demand quick answers, like identifying the most active users.
Exploratory: They require exploring multiple plausible hypotheses, such as understanding why certain users became inactive.
Inferential: Deep-diving into one hypothesis and finding correlations between different datasets, like investigating if the decrease in active users is due to Covid.
Predictive: Utilising predictive models, potentially using regression, to forecast which customers will become active in the future.
Causal: Involving statistically significant experiments (e.g., chi-squared test) on a sample of users to determine whether in-app rewards cause more users to become active.
Mechanistic: Involving exploration and linking of multiple related datasets to understand how in-app rewards influence user activity.

By recognising the type of question at hand, you gain a framework for effectively addressing it.

Connect with Us

🐦 Twitter: @DataPragmatist

💼 LinkedIn: https://www.linkedin.com/company/data-pragmatist/

That's it for our first issue! We hope you're as excited as we are to explore the fascinating world of data engineering, analysis, and AI together. Remember, being a data pragmatist is not just a journey—it's a mindset!

Stay curious, stay innovative, and keep embracing the magic of data alchemy!