- Data Pragmatist
- Posts
- Analytical Voyages #5 : Jacob Marks - Journey of Scientific Curiosity
Analytical Voyages #5 : Jacob Marks - Journey of Scientific Curiosity
From Quantum Physics to Machine Learning
Welcome to this edition of the Data Pragmatist, your dose of all things data science and AI.
📖 Estimated Reading Time: 6 minutes. Missed our previous editions of Analytical Voyages?
We are back with yet another insightful episode of our ‘Analytical Voyages” series, where we get a glance at the journey of experienced data scientists and analysts who have carved out a niche for themselves in Data science. In this episode, we are glad to introduce you to Jacob Marks, a Machine Learning Engineer and Developer Evangelist at Voxel51. At Voxel51, Jacob created VoxelGPT — an LLM-powered AI assistant for computer vision. In addition, he leads open-source efforts in vector search, semantic search, and generative AI. He is a Top Writer in AI on Medium, with 6,700+ followers, and a LinkedIn Top Voice in AI.
5 insane AI websites you cannot miss in 2023
Jenni AI: Jenni AI is a research and writing assistant that can help you to find and synthesize information from a variety of sources. It can also generate text, translate languages, and answer your questions in a comprehensive and informative way.
Usestyle.ai: Usestyle.ai is a business intelligence platform that uses AI to help businesses make better decisions. It can help you to track your performance, identify trends, and develop strategies for growth.
tldv.io: tldv.io is an AI-powered meeting notes tool that can help you to save time and stay focused during meetings. It can generate summaries of meeting notes, identify key takeaways, and assign action items.
Invideo.io: Invideo.io is an AI video editor that can help you to create professional-looking videos without any prior experience. It offers a variety of features, including video trimming, merging, text overlays, and music addition.
Durable.co: Durable.co is an AI website builder that can help you to create a custom website in minutes. It offers a variety of templates and design options, and you don't need any coding knowledge to get started.
🧠 Navigating Data's Frontiers with Jacob Marks
From Quantum Physics to Machine Learning: A Journey of Scientific Curiosity
Machine learning as a career offers a dynamic and promising path in the ever-evolving world of technology. It involves the development of algorithms and models that allow computers to learn from data and make intelligent decisions.
When Jacob was Asked about the same, he replied as follows,
“I arrived at machine learning by way of physics. Before transitioning into ML, I spent my time researching quantum phases of matter — how collections of particles can together exhibit wildly different behaviour than they could individually. I was fascinated by big ideas like symmetry, order, and emergence, and I was obsessed with understanding why things worked.
Near the end of my Ph.D., I grew increasingly interested in machine learning as a powerful toolset for solving hard problems and creating change. The area of ML that especially drew my interest was interpretability and explainability. As we continue to integrate ML models into more aspects of our lives and our society, it is more important than ever that we understand why the models we deploy are generating specific predictions. Working in this space seemed like a great way to scratch the same scientific itch that had led me to physics while having a tangible impact! That's how I ended up at Voxel51, a company committed to bringing clarity and transparency to the world's data!”
The Art of Data Mastery: Jacob Marks' Blueprint for Success
When questioned on this topic, Jacob provided the following response:
“Data analytics, data science, and machine learning span a vast landscape of potential job requirements, responsibilities, and skills. These fields are only becoming more expansive as time goes on. One example is the recent emergence of the "AI Engineer". In terms of hard skills, Python, linear algebra, calculus, statistics, and basic machine learning are the only absolute must-haves across all of these disciplines. More importantly and more subtly, however, is the ability to think critically. Far too often, people are throwing massive AI models at simple problems, or worse — they are using "GenAI" as a substitute for elementary statistics, hypothesis testing, and evaluation. Focus on the problem!”
Hence every data analyst/scientist or engineer should possess the following skills:
Data Analysis: Proficiency in data manipulation and analysis using tools like Python and SQL.
Statistics: Understanding statistical concepts for meaningful data insights.
Machine Learning: Knowledge of ML algorithms and model building.
Data Visualization: Skills in creating informative data visualizations.
Critical Thinking: Ability to think critically and solve complex problems systematically.
Emerging Trends or Technologies
Trends and technologies that will shape the future of machine learning include:
Multimodal Learning: The blurring of boundaries between traditionally distinct domains, allowing the same model architectures to be applied in various fields with modifications.
AutoML and AI Automation: Automated machine learning (AutoML) tools that make it easier for non-experts to create ML models, reducing the barrier to entry.
Explainable AI (XAI): Greater emphasis on interpretability and transparency in AI models to understand and trust the decision-making process.
Federated Learning: Decentralized training of machine learning models, preserving data privacy while allowing collaborative model training.
Edge Computing: ML models running on edge devices, enabling real-time processing and decision-making without relying on the cloud.
And Jacob articulated the following response:
For data scientists, ML engineers, and other practitioners, a key trend to keep an eye out for is the increasingly blurred boundaries between traditionally distinct domains in machine learning. Tools and models are becoming multimodal, and the same model architectures that appear in one domain are appearing, perhaps with some modification, in other domains. This is making it easier than ever to draw insights from multiple data modalities.
For those on the strategy/investment side, the dominant trend is going to be the plummeting marginal cost of intelligence. This is going to change the composition of organizations and will force a rethinking of what constitutes a competitive advantage.
The Confluence of Big Data and AI
The fusion of AI and Big Data is reshaping data analysis strategies, enabling the extraction of valuable insights from vast and complex datasets. This convergence empowers more sophisticated and efficient techniques to uncover hidden patterns and knowledge within Big Data.
”The term “Big Data" can be quite misleading. Humans and the systems we have created are generating more digital information than ever before, and harnessing these massive amounts of data is going to be essential to future data analytics and machine learning efforts. That being said, more data is not always better. The quality, diversity, and relevance of the data you use are as important, if not more important than its quantity. Even models like GPT4, which in common parlance were trained on "the whole internet", were only trained on a meticulously crafted and curated high-quality subset of the internet's data.
At the convergence of "Big Data" and AI, we are going to see the proliferation of "data moats". Large language models (LLMs) enable the extraction of insights from unstructured data, and this data can then be used to retrain or enhance the effectiveness of the LLM. This will create positive feedback loops where the companies developing the ML pipelines will accumulate advantages over time.”
VoxelGPT
“A simple idea led to a crazy idea, and this journey brought that crazy idea to life. With prompt engineering, some genuine software engineering, a lot of elbow grease, and a healthy dose of black magic, our small team created an LLM-powered application that translates natural language queries into filtered views of computer vision datasets.”
VoxelGPT is a powerful AI application that integrates large language models (LLMs) to translate natural language queries into filtered views of computer vision datasets. It represents a cutting-edge solution that bridges the gap between language and vision, enabling users to interact with complex visual data through simple, natural language queries. Developed by Jacob Marks and his team, VoxelGPT showcases the transformative potential of AI in simplifying intricate computer vision tasks and enhancing data accessibility.
Jacob Marks' journey from quantum physics to machine learning underscores the interdisciplinary nature of AI and data science, emphasizing critical thinking and essential technical skills. Emerging trends in AI, data quality, and the concept of "data moats" reshape the landscape, while VoxelGPT showcases AI's potential to simplify complex tasks. Jacob's insights guide us through the data-driven future, emphasizing curiosity, innovation, and technology's pivotal role in shaping AI and data science.
How did you like today's email? |
If you are interested in contributing to the newsletter, respond to this email. We are looking for contributions from you — our readers to keep the community alive and going.