Understanding Exploratory Data Analysis (EDA)

Worst telecom hack in US history

In partnership with

Welcome to learning edition of the Data Pragmatist, your dose of all things data science and AI.

đź“– Estimated Reading Time: 5 minutes. Missed our previous editions?

đź’Ą OpenAI is planning its own browser to rival Google LINK

  • OpenAI is reportedly exploring the development of a web browser designed to rival Google Chrome, incorporating its AI technology like ChatGPT, though the project is still in its early stages.

  • The company has recruited experts from the original Chrome development team, indicating serious intentions towards launching this AI-focused browsing solution.

  • OpenAI is also in discussions with technology and service providers, such as Samsung, to integrate its AI features into products that currently rely on Google's existing solutions.

🍎 Apple is working on 'LLM Siri' LINK

  • Apple is testing a new "LLM Siri" expected to be announced as part of iOS 19, with a preview at WWDC 2025, but it won't be available before spring 2026.

  • The long wait for LLM Siri is due to Apple's strong commitment to privacy, ensuring most processing is done on-device rather than in the cloud, unlike Google’s approach.

  • Once LLM Siri is launched, it aims to offer powerful assistance comparable to other systems, while maintaining user privacy by storing and processing data locally on Apple devices.

Add file uploads instantly with Pinata’s developer-friendly File API

As a developer, your time is valuable. That’s why Pinata’s File API is built to simplify file management, letting you add uploads and retrieval quickly and effortlessly. Forget the headache of complex setups—our API integrates in minutes, so you can spend more time coding and less time on configurations. With secure, scalable storage and easy-to-use endpoints, Pinata takes the stress out of file handling, giving you a streamlined experience to focus on what really matters: building your app.

🧠 Understanding Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is a crucial step in the data science workflow. It involves analyzing datasets to summarize their main characteristics, often using visual methods. The primary goal of EDA is to uncover patterns, detect anomalies, test hypotheses, and check assumptions through statistical summaries and visualizations before applying more complex modeling techniques.

Key Steps in EDA

  1. Understanding the Dataset

    • Start by examining the structure of the dataset (e.g., rows, columns, and data types).

    • Identify the presence of categorical and numerical variables.

    • Check for missing values or duplicates.

  2. Descriptive Statistics

    • Calculate measures of central tendency (mean, median, mode) and variability (range, variance, standard deviation).

    • Summarize the distribution of variables to identify any outliers or irregular patterns.

  3. Univariate Analysis

    • Analyze each variable independently.

    • Use histograms, bar plots, and box plots to visualize distributions.

    • Assess whether the data is skewed or normally distributed.

  4. Bivariate Analysis

    • Examine relationships between pairs of variables.

    • Use scatter plots for continuous variables and cross-tabulation for categorical variables.

    • Analyze correlations using metrics like Pearson’s or Spearman’s coefficient.

  5. Multivariate Analysis

    • Investigate interactions among multiple variables.

    • Use techniques like pair plots, heatmaps, or dimensionality reduction methods (e.g., PCA).

  6. Handling Missing Data

    • Evaluate the extent and pattern of missing values.

    • Decide whether to drop missing data, impute values, or use advanced methods to handle them.

  7. Identifying Outliers

    • Detect and analyze outliers using visualizations like box plots or statistical methods like the Z-score.

    • Decide whether to remove, transform, or retain outliers based on the context.

Importance of EDA

EDA helps data scientists gain a deeper understanding of their data, which is critical for:

  • Identifying key patterns and trends.

  • Ensuring data quality and accuracy.

  • Building hypotheses for further analysis.

  • Selecting appropriate models and techniques for machine learning.

EDA sets the foundation for effective analysis and ensures better outcomes in subsequent steps of a data science project.

5 AI Tools for Social Media Managers

  1. Hootsuite

    • Allows scheduling posts across multiple platforms.

    • Uses AI to analyze audience behavior and suggest optimal posting times.

    • Provides real-time performance metrics for strategy refinement.

  2. Canva

    • Simplifies the creation of visually appealing social media content.

    • Offers AI-driven templates and design suggestions.

    • Suitable for users with no prior design expertise.

  3. Sprout Social

    • Combines scheduling, analytics, and audience engagement in one platform.

    • Uses AI for sentiment analysis and engagement strategy recommendations.

    • Helps measure and improve audience interactions.

  4. Lumen5

    • Converts blogs and articles into short, engaging videos.

    • Perfect for creating video content for Instagram, Facebook, and LinkedIn.

    • Uses AI to streamline content repurposing.

  5. RiteTag

    • Recommends hashtags using AI to boost content visibility.

    • Helps optimize hashtag strategy for greater reach and engagement.

If you are interested in contributing to the newsletter, respond to this email. We are looking for contributions from you — our readers to keep the community alive and going.