Loading...
Loading...
0 / 10 episodes
No episodes yet
Tap + Later on any episode to add it here.
Sam Charrington
In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a highly regulated environment. Rashmi walks us through Chat Concierge, a multi-agent chat experience for auto dealerships that handles intent disambiguation, tool invocation, and human handoffs to deliver safer, more personalized customer journeys. We discuss Capital One’s platform-centric approach to AI agents and how it separates design from runtime governance, embedding policies, guardrails, and cyber controls across agent threat boundaries. Rashmi shares how the team approaches the developer experience for agent builders, observability, and evals for stochastic, multi-agent workflows; and strategies for model specialization, including fine-tuning and distillation. We also cover standards and abstraction, closed-loop learning from production telemetry, and key lessons for enterprises building agentic systems. The complete show notes for this episode can be found at https://twimlai.com/go/765.
Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving the way for latency-sensitive applications like voice interactions and fast agentic loops. We also cover the open research challenges in diffusion LLM training, serving infrastructure requirements, and post-training for diffusion-based systems. Finally, Stefano shares his perspective on whether diffusion models can rival or surpass autoregressive LLMs at scale, the advantages for highly controllable generation, and what the future of multimodal diffusion models might look like. The complete show notes for this episode can be found at https://twimlai.com/go/764.
In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI-assisted coding with end-to-end autonomy, arguing that “code is a commodity” and acceptance is the real metric—security, standards, tests, and maintainability included. We explore Blitzy’s hybrid graph-plus-vector approach, which grounds agents and combines semantic signals with keyword search to navigate large repositories efficiently. Sid breaks down context and agent engineering, how effective context windows have plateaued, and why dynamic agent personas, tool selection, and model-specific prompting matter at scale. He details their orchestration of large swarms of AI agents to collaboratively analyze codebases, plan tasks, and execute complex tasks in parallel. We also dig into why Agents.md and flat memories break down, storing feedback in the knowledge graph, and building real-world evals beyond leaderboards to choose the right model for each task. The complete show notes for this episode can be found at https://twimlai.com/go/763.
In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We discuss the shift from raw model scaling to reasoning-focused post-training, inference-time techniques, and better tool integration. Sebastian explains why methods like self-consistency, self-refinement, and verifiable-reward reinforcement learning have become central to progress in domains like math and coding, and where those approaches still fall short. We also explore agentic workflows in practice, including where multi-agent systems add real value and where reliability constraints still dominate system design. The conversation covers architecture trends such as mixture-of-experts, attention efficiency strategies, and the practical impact of long-context models, alongside persistent challenges like continual learning. We close with Sebastian’s perspective on maintaining strong coding fundamentals in the age of AI assistants and a preview of his new book, Build A Reasoning Model (From Scratch). The complete show notes for this episode can be found at https://twimlai.com/go/762.
Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large models, and how combining synthetic data generation, imitation learning, and reinforcement learning can unlock stronger reasoning capabilities in smaller models. Yejin explains the risks of homogeneity in model outputs and mode collapse highlighted in her “Artificial Hivemind” paper, and its impacts on human creativity and knowledge. We also discuss her team's novel approaches, including reinforcement learning as a pre-training objective, where models are incentivized to “think” before predicting the next token, and "Prismatic Synthesis," a gradient-based method for generating diverse synthetic math data while filtering overrepresented examples. Additionally, we cover the societal implications of AI and the concept of pluralistic alignment—ensuring AI reflects the diverse norms and values of humanity. Finally, Yejin shares her mission to democratize AI beyond large organizations and offers her predictions for the coming year. The complete show notes for this episode can be found at https://twimlai.com/go/761.
Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. The complete show notes for this episode can be found at https://twimlai.com/go/760.
Today, we're joined by Aakanksha Chowdhery, member of technical staff at Reflection, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought to move beyond static benchmarks. We explore the limitations of next-token prediction for multi-step workflows and examine how attention mechanisms, loss objectives, and training data must evolve to support long-form reasoning and planning. Aakanksha shares insights on the difference between context retrieval and actual reasoning, the importance of "trajectory" training data, and why scaling remains essential for discovering emergent agentic capabilities like error recovery and dynamic tool learning. The complete show notes for this episode can be found at https://twimlai.com/go/759.
In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language priors, and how his team used attention-guided alignment to enforce better visual grounding. We also explore a novel approach to generalized contrastive learning designed to solve complex, composed retrieval tasks—such as searching via combined text and image queries—without increasing inference costs. Finally, we cover the difficulties generative models face when rendering multiple human subjects, and the new "MultiHuman Testbench" his team created to measure and mitigate issues like identity leakage and attribute blending. Throughout the discussion, we examine how these innovations align with the need for efficient, on-device AI deployment. The complete show notes for this episode can be found at https://twimlai.com/go/758.
In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which involves disaggregating workloads across a mix of hardware—from H100s to older GPUs and CPUs—to optimize unit economics without sacrificing performance. We dive into their "three-layer cake" architecture: workload disaggregation, a compilation layer that maps models to specific hardware targets, and a novel system that uses LLMs to autonomously rewrite and optimize compute kernels. Finally, we discuss the complexities of networking in heterogeneous environments, the trade-offs between numerical precision and application accuracy, and the future of hardware-aware scheduling. The complete show notes for this episode can be found at https://twimlai.com/go/757.
Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why this counterintuitive choice has proven far more robust and generalizable for handling complex web interfaces. Devi also shares insights into Yutori’s training pipeline, which has evolved from supervised fine-tuning to include rejection sampling and reinforcement learning. Finally, we discuss how Yutori’s “Scouts” agents orchestrate multiple tools and sub-agents to handle complex queries, the importance of background, "ambient" operation for these systems, and what the path looks like from simple monitoring to full task automation on the web. The complete show notes for this episode can be found at https://twimlai.com/go/756.
Today, we're joined by Robin Braun, VP of AI business development for hybrid cloud at HPE, and Luke Norris, co-founder and CEO of Kamiwaza, to discuss how AI systems can be used to automate complex workflows and unlock value from legacy enterprise data. Robin and Luke detail high-impact use cases from HPE and Kamiwaza’s collaboration on an “Agentic Smart City” project for Vail, Colorado, including remediation and automation of website accessibility for 508 compliance, digitization and understanding of deed restrictions, and combining contextual information with camera feeds for fire detection and risk assessment. Additionally, we discuss the role of private cloud infrastructure in overcoming challenges like cost, data privacy, and compliance. Robin and Luke also share their lessons learned, including the importance of fresh data, and the value of a "mud puddle by mud puddle" approach in achieving practical AI wins. The complete show notes for this episode can be found at https://twimlai.com/go/755.
In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a convergence of three key areas: the advanced reasoning capabilities of modern LLMs, the rise of formal proof languages like Lean, and breakthroughs in code generation. We explore the core technical challenges, including the massive data gap between general-purpose code and formal math code, and the difficult problem of "autoformalization," or translating natural language proofs into a machine-verifiable format. Carina also shares Axiom's vision for a self-improving system that uses a self-play loop of conjecturing and proving to discover new mathematical knowledge. Finally, we discuss the broader applications of this technology in areas like formal verification for high-stakes software and hardware. The complete show notes for this episode can be found at https://twimlai.com/go/754.
In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling process. Hung details his team's work on SwiftBrush and SwiftEdit, which enable high-quality text-to-image generation and editing in a single inference step. He explains their novel distillation framework, where a multi-step teacher model guides the training of an efficient, single-step student model. We explore the architecture and training, including the use of a secondary 'coach' network that aligns the student's denoising function with the teacher's, allowing the model to bypass the iterative process entirely. Finally, we discuss how these efficiency breakthroughs pave the way for personalized on-device agents and the challenges of running reasoning models with techniques like inference-time scaling under a fixed compute budget. The complete show notes for this episode can be found at https://twimlai.com/go/753.
Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software development from typing characters to expressing intent, creating a new layer of abstraction similar to how high-level code compiles to machine code. We explore the current capabilities and limitations of coding agents, the importance of context engineering, and the practices that separate successful vibe coders from frustrated ones. Alex also shares Lovable’s technical journey, from an early, complex agent architecture that failed, to a simpler workflow-based system, and back again to an agentic approach as foundation models improved. He also details the company's massive scaling challenges—like accidentally taking down GitHub—and makes the case for why robust evaluations and more expressive user interfaces are the most critical components for AI-native development tools to succeed in the near future. The complete show notes for this episode can be found at https://twimlai.com/go/752.
In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the traditional instruction-fetch paradigm of CPUs and GPUs. We explore how this architecture is well-suited for LLM inference, reducing memory bandwidth bottlenecks and improving performance. Kunle reviews how this system also enables efficient multi-model serving and agentic workflows through its large, tiered memory and fast model-switching capabilities. Finally, we discuss his research into future dynamic reconfigurable architectures, and the use of AI agents to build compilers for new hardware. The complete show notes for this episode can be found at https://twimlai.com/go/751.
Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to overcome them, including windowed attention, grouped query attention, and latent space attention. We explore the idea of weight-state balance and the weight-state FLOP ratio as a way of reasoning about the optimality of compute architectures, and we dig into the Power Retention architecture, which blends the parallelization of attention with the linear scaling of recurrence and promises speedups of >10x during training and >100x during inference. We review Manifest AI’s recent open source projects as well: Vidrial—a custom CUDA framework for building highly optimized GPU kernels in Python, and PowerCoder—a 3B-parameter coding model fine-tuned from StarCoder to use power retention. Our chat also covers the use of metrics like in-context learning curves and negative log likelihood to measure context utility, the implications of scaling laws, and the future of long context lengths in AI applications. The complete show notes for this episode can be found at https://twimlai.com/go/750.
In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-owned AI. Illia shares his unique journey from developing the Transformer architecture at Google to building the NEAR Protocol blockchain to solve global payment challenges, and now applying those decentralized principles back to AI. We explore how Near AI is creating a decentralized cloud that leverages confidential computing, secure enclaves, and the blockchain to protect both user data and proprietary model weights. Illia also shares his three-part approach to fostering trust: open model training to eliminate hidden biases and "sleeper agents," verifiability of inference to ensure the model runs as intended, and formal verification at the invocation layer to enforce composable guarantees on AI agent actions. Finally, Illia shares his perspective on the future of open research, the role of tokenized incentive models, and the need for formal verification in building compliance and user trust. The complete show notes for this episode can be found at https://twimlai.com/go/749.
Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and textual data for a variety of tasks. Oliver explains how Nano Banana can generate and iteratively edit images while maintaining consistency, and how its integration with Gemini’s world knowledge expands creative and practical use cases. We discuss the tension between aesthetics and accuracy, the relative maturity of image models compared to text-based LLMs, and scaling as a driver of progress. Oliver also shares surprising emergent behaviors, the challenges of evaluating vision-language models, and the risks of training on AI-generated data. Finally, we look ahead to interactive world models and VLMs that may one day “think” and “reason” in images. The complete show notes for this episode can be found at https://twimlai.com/go/748.
Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her ICML 2025 Outstanding Paper Award winner, “Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction,” which examines why LLMs struggle with generating truly novel ideas. We dig into the "Roll the dice" approach, which encourages structured exploration by injecting randomness at the start of generation, and the "Look before you leap" concept, which trains models to take "leaps of thought" using alternative objectives to create more diverse and structured outputs. We also discuss Aditi’s papers exploring the counterintuitive phenomenon of "catastrophic overtraining," where training models on more data improves benchmark performance but degrades their ability to be fine-tuned for new tasks, and dig into her lab's work on creating more controllable and reliable models, including the concept of "memorization sinks," an architectural approach to isolate and enable the targeted unlearning of specific information. The complete show notes for this episode can be found at https://twimlai.com/go/747.
Today, we're joined by Animesh Koratana, founder and CEO of PlayerZero to discuss his team’s approach to making agentic and AI-assisted coding tools production-ready at scale. Animesh explains how rapid advances in AI-assisted coding have created an “asymmetry” where the speed of code output outpaces the maturity of processes for maintenance and support. We explore PlayerZero’s debugging and code verification platform, which uses code simulations to build a "memory bank" of past bugs and leverages an ensemble of LLMs and agents to proactively simulate and verify changes, predicting potential failures. Animesh also unpacks the underlying technology, including a semantic graph that analyzes code bases, ticketing systems, and telemetry to trace and reason through complex systems, test hypotheses, and apply reinforcement learning techniques to create an “immune system” for software. Finally, Animesh shares his perspective on the future of the software development lifecycle (SDLC), rethinking organizational workflows, and ensuring security as AI-driven tools continue to mature. The complete show notes for this episode can be found at https://twimlai.com/go/746.
In this episode, Christian Szegedy, Chief Scientist at Morph Labs, joins us to discuss how the application of formal mathematics and reasoning enables the creation of more robust and safer AI systems. A pioneer behind concepts like the Inception architecture and adversarial examples, Christian now focuses on autoformalization—the AI-driven process of translating mathematical concepts from their human-readable form into rigorously formal, machine-verifiable logic. We explore the critical distinction between the informal reasoning of current LLMs, which can be prone to errors and subversion, and the provably correct reasoning enabled by formal systems. Christian outlines how this approach provides a robust path toward AI safety and also creates the high-quality, verifiable data needed to train models capable of surpassing human scientists in specialized domains. We also delve into his predictions for achieving this superintelligence and his ultimate vision for AI as a tool that helps humanity understand itself. The complete show notes for this episode can be found at https://twimlai.com/go/745.
Today, we're joined by Prince Canuma, an ML engineer and open-source developer focused on optimizing AI inference on Apple Silicon devices. Prince shares his journey to becoming one of the most prolific contributors to Apple’s MLX ecosystem, having published over 1,000 models and libraries that make open, multimodal AI accessible and performant on Apple devices. We explore his workflow for adapting new models in MLX, the trade-offs between the GPU and Neural Engine, and how optimization methods like pruning and quantization enhance performance. We also cover his work on "Fusion," a weight-space method for combining model behaviors without retraining, and his popular packages—MLX-Audio, MLX-Embeddings, and MLX-VLM—which streamline the use of MLX across different modalities. Finally, Prince introduces Marvis, a real-time speech-to-speech voice agent, and shares his vision for the future of AI, emphasizing the move towards "media models" that can handle multiple modalities, and more. The complete show notes for this episode can be found at https://twimlai.com/go/744.
Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-up capabilities, including creating real-time, interactive, and high-resolution environments. Jack and Shlomi share their perspectives on what defines a world model, the model's architecture, and key technical challenges and breakthroughs, including Genie 3’s visual memory and ability to handle “promptable world events.” Jack, Shlomi, and Sam share their favorite Genie 3 demos, and discuss its potential as a dynamic training environment for embodied AI agents. Finally, we will explore future directions for Genie research. The complete show notes for this episode can be found at https://twimlai.com/go/743.
In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing the friction that often stalls deployment. We explore the strategic shift from treating models as commodities to viewing them as core product assets. Lin details how post-training methods, like reinforcement fine-tuning (RFT), allow teams to leverage their own proprietary data to continuously improve these assets. Lin also breaks down the complex challenge of what she calls "3D optimization"—balancing cost, latency, and quality—and emphasizes the role of clear evaluation criteria to guide this process, moving beyond unreliable methods like "vibe checking." Finally, we discuss the path toward the future of AI development: designing a closed-loop system for automated model improvement, a vision made more attainable by the exciting convergence of open and closed-source model capabilities. The complete show notes for this episode can be found at https://twimlai.com/go/742.
In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current limitations of agent protocols like MCPs and how developers can extend them to handle the required context and authority. The conversation challenges the idea that more powerful models lead to more autonomous agents, arguing instead for "graceful recovery" systems that proactively bring humans into the loop when the agent "knows what it doesn't know." We also get into the "application layer" fight, exploring how SaaS platforms are creating data silos and what this means for the future of interoperable AI agents. Filip also shares his vision for the "word artisan"—the non-technical user who can now build and manage a fleet of AI agents, fundamentally changing the nature of knowledge work. The complete show notes for this episode can be found at https://twimlai.com/go/741.
In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneously faster, more accurate, and even cheaper than single-model approaches. Using examples like "laconic decoding," Jared explains the practical techniques for building these systems and the underlying principles of inference-time scaling. The conversation also delves into the critical role of co-design, where the evolution of AI algorithms and the underlying cloud infrastructure are deeply intertwined, shaping the future of agentic AI and the compute landscape. The complete show notes for this episode can be found at https://twimlai.com/go/740.
In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages the complexities of multi-turn conversations. We explore why many production systems favor a modular, multi-model approach over the end-to-end models demonstrated by large AI labs, and how this impacts everything from latency and cost to observability and evaluation. Kwin also digs into the core challenges of interruption handling, turn-taking, and creating truly natural conversational dynamics, and how to overcome them. We discuss use cases, thoughts on where the technology is headed, the move toward hybrid edge-cloud pipelines, and the exciting future of real-time video avatars, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/739.
Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorporates distilling large language models for structured scene understanding and safe planning motion in critical "long-tail" scenarios. We explore how DiMA utilizes LLMs' world knowledge and efficient transformer-based models to significantly reduce collision rates and trajectory errors. We then discuss “SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation,” a diffusion-distilled approach that combines generative models with metric depth estimation to produce sharp, accurate monocular depth maps. Additionally, Fatih also shares a look at Qualcomm’s on-device demos, including text-to-3D mesh generation, real-time image-to-video and video-to-video generation, and a multi-modal visual question-answering assistant. The complete show notes for this episode can be found at https://twimlai.com/go/738.
Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, probabilistic, and noisy environment, a stark contrast to the deterministic APIs of the past. Vijoy introduces Cisco's vision for an "Internet of Agents," a platform to manage this new reality, and its open-source implementation, AGNTCY. We explore the four phases of agent collaboration—discovery, composition, deployment, and evaluation—and dive deep into the communication stack, from syntactic protocols like A2A, ACP, and MCP to the deeper semantic challenges of creating a shared understanding between agents. Vijoy also unveils SLIM (Secure Low-Latency Interactive Messaging), a novel transport layer designed to make agent-to-agent communication quantum-safe, real-time, and efficient for multi-modal workloads. The complete show notes for this episode can be found at https://twimlai.com/go/737.
Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for trading and investment. We explore the firm's platform-centric approach to managing an extensive portfolio of features and models, the impact of multimodal LLMs on accelerating the process of extracting novel features, the importance of strict data timestamping to prevent temporal leakage, and the way they consider build vs. buy decisions in a rapidly evolving landscape. Lastly, Ben also shares insights on leveraging open-source models and the future of agentic AI in quantitative finance. The complete show notes for this episode can be found at https://twimlai.com/go/736.
Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling rivals human performance,” which demonstrates how zero-shot auto-labeling with foundation models can yield to significant cost and time savings compared to traditional human annotation. Jason explains how auto-labels, despite being "noisier" at lower confidence thresholds, can lead to better downstream model performance. We also cover Voxel51's "verified auto-labeling" approach, which utilizes a "stoplight" QA workflow (green, yellow, red light) to minimize human review. Finally, we discuss the challenges of handling decision boundary uncertainty and out-of-domain classes, the differences between synthetic data generation in vision and language domains, and the potential of agentic labeling. The complete show notes for this episode can be found at https://twimlai.com/go/735.
Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734.
Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan Kilpatrick and Shrestha Basu Mallick, PMs at Google DeepMind working on AI Studio and the Gemini API, along with Kwindla Kramer, CEO of Daily and creator of the Pipecat open source project. We cover all the highlights from the event, including enhancements to the Gemini models like thinking budgets and thought summaries, native audio output for expressive voice AI, and the new URL Context tool for research agents. The discussion also digs into the Gemini Live API, covering its architecture, the challenges of building real-time voice applications (such as latency and voice activity detection), and new features like proactive audio and asynchronous function calling. Finally, don’t miss our guests’ wish lists for next year’s I/O! The complete show notes for this episode can be found at https://twimlai.com/go/733.
Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-stakes domains like financial services. We explore how RAG, contrary to some expectations, can inadvertently degrade model safety. We cover examples of unsafe outputs that can emerge from these systems, different approaches to evaluating these safety risks, and the potential reasons behind this counterintuitive behavior. Shifting to the application of generative AI in financial services, Sebastian outlines a domain-specific safety taxonomy designed for the industry's unique needs. We also explore the critical role of governance and regulatory frameworks in addressing these concerns, the role of prompt engineering in bolstering safety, Bloomberg’s multi-layered mitigation strategies, and vital areas for further work in improving AI safety within specialized domains. The complete show notes for this episode can be found at https://twimlai.com/go/732.
Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. The complete show notes for this episode can be found at https://twimlai.com/go/731.
Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step tasks through reinforcement learning, and how that enables agents to more easily recover from failures while executing complex processes. Josh shares insights on the practical applications of these agents, including some unexpected use cases. We also discuss the future of human-AI collaboration in software development, such as with "vibe coding," the integration of tools through the Model Control Protocol (MCP), and the significance of context management in AI-enabled IDEs. Additionally, we highlight the challenges of ensuring trust and safety as AI agents become more powerful and autonomous. The complete show notes for this episode can be found at https://twimlai.com/go/730.
Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. The complete show notes for this episode can be found at https://twimlai.com/go/729.
In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.
In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately supports Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work. The complete show notes for this episode can be found at https://twimlai.com/go/727.
Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori’s two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori’s performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.
Today, we're joined by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machine learning, including vision-language models and generative AI techniques to improve perception, planning, and simulation for its self-driving vehicles. The conversation explores the evolution of Waymo’s research stack, their custom “Waymo Foundation Model,” and how they’re incorporating multimodal sensor data like lidar, radar, and camera into advanced AI systems. Drago also discusses how Waymo ensures safety at scale with rigorous validation frameworks, predictive world models, and realistic simulation environments. Finally, we touch on the challenges of generalization across cities, freeway driving, end-to-end learning vs. modular architectures, and the future of AV testing through ML-powered simulation. The complete show notes for this episode can be found at https://twimlai.com/go/725.
Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language. The complete show notes for this episode can be found at https://twimlai.com/go/724.
Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in latent space.” We dig into “internal reasoning” versus “verbalized reasoning”—analogous to non-verbalized and verbalized thinking in humans, and discuss how the model searches in latent space to predict the next token and dynamically allocates more compute based on token difficulty. We also explore how the recurrent depth architecture simplifies LLMs, the parallels to diffusion models, the model's performance on reasoning tasks, the challenges of comparing models with varying compute budgets, and architectural advantages such as zero-shot adaptive exits and natural speculative decoding. The complete show notes for this episode can be found at https://twimlai.com/go/723.
Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task environments—maze, mini-behavior, and frozen lake. We explore token discrepancy loss, a technique designed to align language and visual embeddings, ensuring accurate and meaningful visual representations. Additionally, we cover the data collection and training process, reasoning over relative spatial relations between different entities, and dynamic spatial reasoning. Lastly, Chengzu shares insights from experiments with MVoT, focusing on the lessons learned and the potential for applying these models in real-world scenarios like robotics and architectural design. The complete show notes for this episode can be found at https://twimlai.com/go/722.
Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1’s data curation process, its training recipe, and its use of model distillation from Google Gemini and DeepSeek R1. We explore the novel "budget forcing" technique developed in the paper, allowing it to think longer for harder problems and optimize test-time compute for better performance. Additionally, we cover the evaluation benchmarks used, the comparison between supervised fine-tuning and reinforcement learning, and similar projects like the Hugging Face Open R1 project. Finally, we discuss the open-sourcing of S1 and its future directions. The complete show notes for this episode can be found at https://twimlai.com/go/721.
Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memory bandwidth, memory capacity, and network bandwidth. We also discuss the Trainium tooling ecosystem including the Neuron SDK, Neuron Compiler, and Neuron Kernel Interface (NKI). We also dig into the various ways Trainum2 is offered, including Trn2 instances, UltraServers, and UltraClusters, and access through managed services like AWS Bedrock. Finally, we cover sparsity optimizations, customer adoption, performance benchmarks, support for Mixture of Experts (MoE) models, and what’s next for Trainium. The complete show notes for this episode can be found at https://twimlai.com/go/720.
Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diverse mixture of real-world data to ensure robust and intelligent robot learning. We review the data collection approach, which uses human operators and teleoperation rigs, the potential of synthetic data and reinforcement learning in enhancing robotic capabilities, and much more. We also introduce the team’s new FAST tokenizer, which opens the door to a fully Transformer-based model and significant improvements in learning and generalization. Finally, we cover the open-sourcing of π0 and future directions for their research. The complete show notes for this episode can be found at https://twimlai.com/go/719.
Today we’re joined by Victor Dibia, principal research software engineer at Microsoft Research, to explore the key trends and advancements in AI agents and multi-agent systems shaping 2025 and beyond. In this episode, we discuss the unique abilities that set AI agents apart from traditional software systems–reasoning, acting, communicating, and adapting. We also examine the rise of agentic foundation models, the emergence of interface agents like Claude with Computer Use and OpenAI Operator, the shift from simple task chains to complex workflows, and the growing range of enterprise use cases. Victor shares insights into emerging design patterns for autonomous multi-agent systems, including graph and message-driven architectures, the advantages of the “actor model” pattern as implemented in Microsoft’s AutoGen, and guidance on how users should approach the ”build vs. buy” decision when working with AI agent frameworks. We also address the challenges of evaluating end-to-end agent performance, the complexities of benchmarking agentic systems, and the implications of our reliance on LLMs as judges. Finally, we look ahead to the future of AI agents in 2025 and beyond, discuss emerging HCI challenges, their potential for impact on the workforce, and how they are poised to reshape fields like software engineering. The complete show notes for this episode can be found at https://twimlai.com/go/718.
Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit key inference metrics such as time-to-first-token, tokens per second, and tokens per joule. We then dig into a variety of techniques that can be used to accelerate inference such as KV compression, quantization, pruning, speculative decoding, and leveraging small language models (SLMs). We also discuss future directions for enabling on-device agentic experiences such as parallel generation and software tools like Qualcomm AI Orchestrator. The complete show notes for this episode can be found at https://twimlai.com/go/717.
Today, we're joined by Patricia Thaine, co-founder and CEO of Private AI to discuss techniques for ensuring privacy, data minimization, and compliance when using 3rd-party large language models (LLMs) and other AI services. We explore the risks of data leakage from LLMs and embeddings, the complexities of identifying and redacting personal information across various data flows, and the approach Private AI has taken to mitigate these risks. We also dig into the challenges of entity recognition in multimodal systems including OCR files, documents, images, and audio, and the importance of data quality and model accuracy. Additionally, Patricia shares insights on the limitations of data anonymization, the benefits of balancing real-world and synthetic data in model training and development, and the relationship between privacy and bias in AI. Finally, we touch on the evolving landscape of AI regulations like GDPR, CPRA, and the EU AI Act, and the future of privacy in artificial intelligence. The complete show notes for this episode can be found at https://twimlai.com/go/716.
Today, we're joined by Chip Huyen, independent researcher and writer to discuss her new book, “AI Engineering.” We dig into the definition of AI engineering, its key differences from traditional machine learning engineering, the common pitfalls encountered in engineering AI systems, and strategies to overcome them. We also explore how Chip defines AI agents, their current limitations and capabilities, and the critical role of effective planning and tool utilization in these systems. Additionally, Chip shares insights on the importance of evaluation in AI systems, highlighting the need for systematic processes, human oversight, and rigorous metrics and benchmarks. Finally, we touch on the impact of open-source models, the potential of synthetic data, and Chip’s predictions for the year ahead. The complete show notes for this episode can be found at https://twimlai.com/go/715.
Today, we're joined by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the company’s approach and insights on Generative AI and platform best practices. In this episode, we dig into the company’s platform-centric approach to AI, and how they’ve been evolving their existing MLOps and data platforms to support the new challenges and opportunities presented by generative AI workloads and AI agents. We explore their use of cloud-based infrastructure—in this case on AWS—to provide a foundation upon which they then layer open-source and proprietary services and tools. We cover their use of Llama 3 and open-weight models, their approach to fine-tuning, their observability tooling for Gen AI applications, their use of inference optimization techniques like quantization, and more. Finally, Abhijit shares the future of agentic workflows in the enterprise, the application of OpenAI o1-style reasoning in models, and the new roles and skillsets required in the evolving GenAI landscape. The complete show notes for this episode can be found at https://twimlai.com/go/714.
Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big brain, little brain, tool brain” approach to tackling real-world challenges in agents, the trade-offs in leveraging general-purpose vs. task-specific models, and his take on LLM reasoning. We also cover the way he thinks about model selection for agents, along with the need for new tools and platforms for deploying them. Finally, Dan emphasizes the importance of open source in advancing AI, shares the new products they’re working on, and explores the future directions in the agentic era. The complete show notes for this episode can be found at https://twimlai.com/go/713.
Today, we're joined by Byron Cook, VP and distinguished scientist in the Automated Reasoning Group at AWS to dig into the underlying technology behind the newly announced Automated Reasoning Checks feature of Amazon Bedrock Guardrails. Automated Reasoning Checks uses mathematical proofs to help LLM users safeguard against hallucinations. We explore recent advancements in the field of automated reasoning, as well as some of the ways it is applied broadly, as well as across AWS, where it is used to enhance security, cryptography, virtualization, and more. We discuss how the new feature helps users to generate, refine, validate, and formalize policies, and how those policies can be deployed alongside LLM applications to ensure the accuracy of generated text. Finally, Byron also shares the benchmarks they’ve applied, the use of techniques like ‘constrained coding’ and ‘backtracking,’ and the future co-evolution of automated reasoning and generative AI. The complete show notes for this episode can be found at https://twimlai.com/go/712.
Today, we're joined by Arash Behboodi, director of engineering at Qualcomm AI Research to discuss the papers and workshops Qualcomm will be presenting at this year’s NeurIPS conference. We dig into the challenges and opportunities presented by differentiable simulation in wireless systems, the sciences, and beyond. We also explore recent work that ties conformal prediction to information theory, yielding a novel approach to incorporating uncertainty quantification directly into machine learning models. Finally, we review several papers enabling the efficient use of LoRA (Low-Rank Adaptation) on mobile devices (Hollowed Net, ShiRA, FouRA). Arash also previews the demos Qualcomm will be hosting at NeurIPS, including new video editing diffusion and 3D content generation models running on-device, Qualcomm's AI Hub, and more! The complete show notes for this episode can be found at https://twimlai.com/go/711.
Today, we're joined by Shirley Wu, senior director of software engineering at Juniper Networks to discuss how machine learning and artificial intelligence are transforming network management. We explore various use cases where AI and ML are applied to enhance the quality, performance, and efficiency of networks across Juniper’s customers, including diagnosing cable degradation, proactive monitoring for coverage gaps, and real-time fault detection. We also dig into the complexities of integrating data science into networking, the trade-offs between traditional methods and ML-based solutions, the role of feature engineering and data in networking, the applicability of large language models, and Juniper’s approach to using smaller, specialized ML models to optimize speed, latency, and cost. Finally, Shirley shares some future directions for Juniper Mist such as proactive network testing and end-user self-service. The complete show notes for this episode can be found at https://twimlai.com/go/710.
Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strategic challenges companies face with their RAG system, the different signs Jason looks for to identify looming problems, the issues he most commonly encounters, and the steps he takes to diagnose these issues. We also cover the significance of building out robust test datasets, data-driven experimentation, evaluation tools, and metrics for different use cases. We also touched on fine-tuning strategies for RAG systems, the effectiveness of different chunking strategies, the use of collaboration tools like Braintrust, and how future models will change the game. Lastly, we cover Jason’s interest in teaching others how to capitalize on their own AI experience via his AI consulting course. The complete show notes for this episode can be found at https://twimlai.com/go/709.
Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flip’s incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model (LLM) trained on a novel "CoMELT" observability dataset which combines traditional MELT data—metrics, events, logs, and traces—with code to efficiently identify root failure causes in complex software systems. We discuss the challenges of integrating time-series data with LLMs and their multi-decoder architecture designed for this purpose. Sunil describes their system's agent-based design, focusing on clear roles and boundaries to ensure reliability. We examine their "chaos gym," a reinforcement learning environment used for testing and improving the system's robustness. Finally, we discuss the practical considerations of deploying such a system at scale in diverse environments and much more. The complete show notes for this episode can be found at https://twimlai.com/go/708.
Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components work together in building intelligent AI voice agents. We discuss the role of multimodal LLMs as well as speech-to-text and text-to-speech models in building AI voice agents, and dig into the benefits and limitations of text-based approaches to voice interactions. We dig into what’s required to deliver real-time voice interactions and the promise of closed-loop, continuously improving, federated learning agents. Finally, Scott shares practical applications of AI voice agents at Deepgram and provides an overview of their newly released agent toolkit. The complete show notes for this episode can be found at https://twimlai.com/go/707.
Today, we're joined by Tim Rocktäschel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popular science book, “Artificial Intelligence: 10 Things You Should Know.” We dig into the attainability of artificial superintelligence and the path to achieving generalized superhuman capabilities across multiple domains. We discuss the importance of open-endedness in developing autonomous and self-improving systems, as well as the role of evolutionary approaches and algorithms. Additionally, we cover Tim’s recent research projects such as “Promptbreeder,” “Debating with More Persuasive LLMs Leads to More Truthful Answers,” and more. The complete show notes for this episode can be found at https://twimlai.com/go/706.
Today, we're joined by Lucas García, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role of verification and validation (V&V) in these applications. We review the popular V-model for engineering critical systems and then dig into the “W” adaptation that’s been proposed for incorporating ML models. Next, we discuss the complexities of applying deep learning neural networks in safety-critical applications using the aviation industry as an example, and talk through the importance of factors such as data quality, model stability, robustness, interpretability, and accuracy. We also explore formal verification methods, abstract transformer layers, transformer-based architectures, and the application of various software testing techniques. Lucas also introduces the field of constrained deep learning and convex neural networks and its benefits and trade-offs. The complete show notes for this episode can be found at https://twimlai.com/go/705.
Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also discuss the importance of verifiers as a technique for safeguarding agent behavior. We then dig into the AI Snake Oil book, which uncovers examples of problematic and overhyped claims in AI. Arvind shares various use cases of failed applications of AI, outlines a taxonomy of AI risks, and shares his insights on AI’s catastrophic risks. Additionally, we also touched on different approaches to LLM-based reasoning, his views on tech policy and regulation, and his work on CORE-Bench, a benchmark designed to measure AI agents' accuracy in computational reproducibility tasks. The complete show notes for this episode can be found at https://twimlai.com/go/704.
Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and complex document analysis tasks. We explore how DocETL's optimizer architecture works, the intricacies of building agentic systems for data processing, the current landscape of benchmarks for data processing tasks, how these differ from reasoning-based benchmarks, and the need for robust evaluation methods for human-in-the-loop LLM workflows. Additionally, Shreya shares real-world applications of DocETL, the importance of effective validation prompts, and building robust and fault-tolerant agentic systems. Lastly, we cover the need for benchmarks tailored to LLM-powered data processing tasks and the future directions for DocETL. The complete show notes for this episode can be found at https://twimlai.com/go/703.
Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part of a production language model.” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language models including ChatGPT and PaLM-2. Nicholas shares the current landscape of AI security research in the age of LLMs, the implications of model stealing, ethical concerns surrounding model privacy, how the attack works, and the significance of the embedding layer in language models. We also discuss the remediation strategies implemented by OpenAI and Google, and the future directions in the field of AI security. Plus, we also cover his other ICML 2024 best paper, “Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining,” which questions the use and promotion of differential privacy in conjunction with pre-trained models. The complete show notes for this episode can be found at https://twimlai.com/go/702.
Today, we're joined by Simon Willison, independent researcher and creator of Datasette to discuss the many ways software developers and engineers can take advantage of large language models (LLMs) to boost their productivity. We dig into Simon’s own workflows and how he uses popular models like ChatGPT and Anthropic’s Claude to write and test hundreds of lines of code while out walking his dog. We review Simon’s favorite prompting and debugging techniques, his strategies for sidestepping the limitations of contemporary models, how he uses Claude’s Artifacts feature for rapid prototyping, his thoughts on the use and impact of vision models, the role he sees for open source models and local LLMs, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/701.
Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS), an approach focused on automatically creating agentic system designs. We explore the spectrum of agentic behaviors, the motivation for learning all aspects of agentic system design, the key components of the ADAS approach, and how it uses LLMs to design novel agent architectures in code. We also cover the iterative process of ADAS, its potential to shed light on the behavior of foundation models, the higher-level meta-behaviors that emerge in agentic systems, and how ADAS uncovers novel design patterns through emergent behaviors, particularly in complex tasks like the ARC challenge. Finally, we touch on the practical applications of ADAS and its potential use in system optimization for real-world tasks. The complete show notes for this episode can be found at https://twimlai.com/go/700.
Today, we're joined by Peter van der Putten, director of the AI Lab at Pega and assistant professor of AI at Leiden University. We discuss the newly adopted European AI Act and the challenges of applying academic fairness metrics in real-world AI applications. We dig into the key ethical principles behind the Act, its broad definition of AI, and how it categorizes various AI risks. We also discuss the practical challenges of implementing fairness and bias metrics in real-world scenarios, and the importance of a risk-based approach in regulating AI systems. Finally, we cover how the EU AI Act might influence global practices, similar to the GDPR's effect on data privacy, and explore strategies for closing bias gaps in real-world automated decision-making. The complete show notes for this episode can be found at https://twimlai.com/go/699.
Today, we're joined by Harrison Chase, co-founder and CEO of LangChain to discuss LLM frameworks, agentic systems, RAG, evaluation, and more. We dig into the elements of a modern LLM framework, including the most productive developer experiences and appropriate levels of abstraction. We dive into agents and agentic systems as well, covering the “spectrum of agenticness,” cognitive architectures, and real-world applications. We explore key challenges in deploying agentic systems, and the importance of agentic architectures as a means of communication in system design and operation. Additionally, we review evolving use cases for RAG, and the role of observability, testing, and evaluation tools in moving LLM applications from prototype to production. Lastly, Harrison shares his hot takes on prompting, multi-modal models, and more! The complete show notes for this episode can be found at https://twimlai.com/go/698.
Today, we're joined by Siddhika Nevrekar, AI Hub head at Qualcomm Technologies, to discuss on-device AI and how to make it easier for developers to take advantage of device capabilities. We unpack the motivations for AI engineers to move model inference from the cloud to local devices, and explore the challenges associated with on-device AI. We dig into the role of hardware solutions, from powerful system-on-chips (SoC) to neural processors, the importance of collaboration between community runtimes like ONNX and TFLite and chip manufacturers, the unique challenges of IoT and autonomous vehicles, and the key metrics developers should focus on to ensure optimal on-device performance. Finally, Siddhika introduces Qualcomm's AI Hub, a platform developed to simplify the process of testing and optimizing AI models across different devices. The complete show notes for this episode can be found at https://twimlai.com/go/697.
Today, we're joined by Ashley Edwards, a member of technical staff at Runway, to discuss Genie: Generative Interactive Environments, a system for creating ‘playable’ video environments for training deep reinforcement learning (RL) agents at scale in a completely unsupervised manner. We explore the motivations behind Genie, the challenges of data acquisition for RL, and Genie’s capability to learn world models from videos without explicit action data, enabling seamless interaction and frame prediction. Ashley walks us through Genie’s core components—the latent action model, video tokenizer, and dynamics model—and explains how these elements collaborate to predict future frames in video sequences. We discuss the model architecture, training strategies, benchmarks used, as well as the application of spatiotemporal transformers and the MaskGIT techniques used for efficient token prediction and representation. Finally, we touched on Genie’s practical implications, its comparison to other video generation models like “Sora,” and potential future directions in video generation and diffusion models. The complete show notes for this episode can be found at https://twimlai.com/go/696.
Today, we're joined by Marius Memmel, a PhD student at the University of Washington, to discuss his research on sim-to-real transfer approaches for developing autonomous robotic agents in unstructured environments. Our conversation focuses on his recent ASID and URDFormer papers. We explore the complexities presented by real-world settings like a cluttered kitchen, data acquisition challenges for training robust models, the importance of simulation, and the challenge of bridging the sim2real gap in robotics. Marius introduces ASID, a framework designed to enable robots to autonomously generate and refine simulation models to improve sim-to-real transfer. We discuss the role of Fisher information as a metric for trajectory sensitivity to physical parameters and the importance of exploration and exploitation phases in robot learning. Additionally, we cover URDFormer, a transformer-based model that generates URDF documents for scene and object reconstruction to create realistic simulation environments. The complete show notes for this episode can be found at https://twimlai.com/go/695.
Today, we're joined by Hamel Husain, founder of Parlance Labs, to discuss the ins and outs of building real-world products using large language models (LLMs). We kick things off discussing novel applications of LLMs and how to think about modern AI user experiences. We then dig into the key challenge faced by LLM developers—how to iterate from a snazzy demo or proof-of-concept to a working LLM-based application. We discuss the pros, cons, and role of fine-tuning LLMs and dig into when to use this technique. We cover the fine-tuning process, common pitfalls in evaluation—such as relying too heavily on generic tools and missing the nuances of specific use cases, open-source LLM fine-tuning tools like Axolotl, the use of LoRA adapters, and more. Hamel also shares insights on model optimization and inference frameworks and how developers should approach these tools. Finally, we dig into how to use systematic evaluation techniques to guide the improvement of your LLM application, the importance of data generation and curation, and the parallels to traditional software engineering practices. The complete show notes for this episode can be found at https://twimlai.com/go/694.
Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual modalities, and the strengths and weaknesses of transformer architectures relative to alternatives for various tasks. We dig into the role of tokenization and patching in transformer pipelines, emphasizing how abstraction and semantic relationships between tokens underpin the model's effectiveness, and explore how this relates to the debate between handcrafted pipelines versus end-to-end architectures in machine learning. Additionally, we touch on the evolving landscape of hybrid models which incorporate elements of attention and state, the significance of state update mechanisms in model adaptability and learning efficiency, and the contribution and adoption of state-space models like Mamba and Mamba-2 in academia and industry. Lastly, Albert shares his vision for advancing foundation models across diverse modalities and applications. The complete show notes for this episode can be found at https://twimlai.com/go/693.
Today, we're joined by Amir Bar, a PhD candidate at Tel Aviv University and UC Berkeley to discuss his research on visual-based learning, including his recent paper, “EgoPet: Egomotion and Interaction Data from an Animal’s Perspective.” Amir shares his research projects focused on self-supervised object detection and analogy reasoning for general computer vision tasks. We also discuss the current limitations of caption-based datasets in model training, the ‘learning problem’ in robotics, and the gap between the capabilities of animals and AI systems. Amir introduces ‘EgoPet,’ a dataset and benchmark tasks which allow motion and interaction data from an animal's perspective to be incorporated into machine learning models for robotic planning and proprioception. We explore the dataset collection process, comparisons with existing datasets and benchmark tasks, the findings on the model performance trained on EgoPet, and the potential of directly training robot policies that mimic animal behavior. The complete show notes for this episode can be found at https://twimlai.com/go/692.
Today, we're joined by Sarah Bird, chief product officer of responsible AI at Microsoft. We discuss the testing and evaluation techniques Microsoft applies to ensure safe deployment and use of generative AI, large language models, and image generation. In our conversation, we explore the unique risks and challenges presented by generative AI, the balance between fairness and security concerns, the application of adaptive and layered defense strategies for rapid response to unforeseen AI behaviors, the importance of automated AI safety testing and evaluation alongside human judgment, and the implementation of red teaming and governance. Sarah also shares learnings from Microsoft's ‘Tay’ and ‘Bing Chat’ incidents along with her thoughts on the rapidly evolving GenAI landscape. The complete show notes for this episode can be found at https://twimlai.com/go/691.
Today, we're joined by Eric Nguyen, PhD student at Stanford University. In our conversation, we explore his research on long context foundation models and their application to biology particularly Hyena, and its evolution into Hyena DNA and Evo models. We discuss Hyena, a convolutional-based language model developed to tackle the challenges posed by long context lengths in language modeling. We dig into the limitations of transformers in dealing with longer sequences, the motivation for using convolutional models over transformers, its model training and architecture, the role of FFT in computational optimizations, and model explainability in long-sequence convolutions. We also talked about Hyena DNA, a genomic foundation model pre-trained on 1 million tokens, designed to capture long-range dependencies in DNA sequences. Finally, Eric introduces Evo, a 7 billion parameter hybrid model integrating attention layers with Hyena DNA's convolutional framework. We cover generating and designing DNA with language models, hallucinations in DNA models, evaluation benchmarks, the trade-offs between state-of-the-art models, zero-shot versus a few-shot performance, and the exciting potential in areas like CRISPR-Cas gene editing. The complete show notes for this episode can be found at https://twimlai.com/go/690.
Today, we're joined by Andres Ravinet, sustainability global black belt at Microsoft, to discuss the role of AI in sustainability. We explore real-world use cases where AI-driven solutions are leveraged to help tackle environmental and societal challenges, from early warning systems for extreme weather events to reducing food waste along the supply chain to conserving the Amazon rainforest. We cover the major threats that sustainability aims to address, the complexities in standardized sustainability compliance reporting, and the factors driving businesses to take a step toward sustainable practices. Lastly, Andres addresses the ways LLMs and generative AI can be applied towards the challenges of sustainability. The complete show notes for this episode can be found at https://twimlai.com/go/689.
Today we’re joined by Fatih Porikli, senior director of technology at Qualcomm AI Research. In our conversation, we covered several of the Qualcomm team’s 16 accepted main track and workshop papers at this year’s CVPR conference. The papers span a variety of generative AI and traditional computer vision topics, with an emphasis on increased training and inference efficiency for mobile and edge deployment. We explore efficient diffusion models for text-to-image generation, grounded reasoning in videos using language models, real-time on-device 360° image generation for video portrait relighting, unique video-language model for situated interactions like fitness coaching, and visual reasoning model and benchmark for interpreting complex mathematical plots, and more! We also touched on several of the demos the team will be presenting at the conference, including multi-modal vision-language models (LLaVA) and parameter-efficient fine tuning (LoRA) on mobile phones. The complete show notes for this episode can be found at https://twimlai.com/go/688.
Today, we're joined by Sasha Luccioni, AI and Climate lead at Hugging Face, to discuss the environmental impact of AI models. We dig into her recent research into the relative energy consumption of general purpose pre-trained models vs. task-specific, non-generative models for common AI tasks. We discuss the implications of the significant difference in efficiency and power consumption between the two types of models. Finally, we explore the complexities of energy efficiency and performance benchmarking, and talk through Sasha’s recent initiative, Energy Star Ratings for AI Models, a rating system designed to help AI users select and deploy models based on their energy efficiency. The complete show notes for this episode can be found at http://twimlai.com/go/687.
Today, we're joined by Christopher Manning, the Thomas M. Siebel professor in Machine Learning at Stanford University and a recent recipient of the 2024 IEEE John von Neumann medal. In our conversation with Chris, we discuss his contributions to foundational research areas in NLP, including word embeddings and attention. We explore his perspectives on the intersection of linguistics and large language models, their ability to learn human language structures, and their potential to teach us about human language acquisition. We also dig into the concept of “intelligence” in language models, as well as the reasoning capabilities of LLMs. Finally, Chris shares his current research interests, alternative architectures he anticipates emerging beyond the LLM, and opportunities ahead in AI research. The complete show notes for this episode can be found at https://twimlai.com/go/686.
Today we're joined by Abdul Fatir Ansari, a machine learning scientist at AWS AI Labs in Berlin, to discuss his paper, "Chronos: Learning the Language of Time Series." Fatir explains the challenges of leveraging pre-trained language models for time series forecasting. We explore the advantages of Chronos over statistical models, as well as its promising results in zero-shot forecasting benchmarks. Finally, we address critiques of Chronos, the ongoing research to improve synthetic data quality, and the potential for integrating Chronos into production systems. The complete show notes for this episode can be found at twimlai.com/go/685.
Today we're joined by Joel Hestness, principal research scientist and lead of the core machine learning team at Cerebras. We discuss Cerebras’ custom silicon for machine learning, Wafer Scale Engine 3, and how the latest version of the company’s single-chip platform for ML has evolved to support large language models. Joel shares how WSE3 differs from other AI hardware solutions, such as GPUs, TPUs, and AWS’ Inferentia, and talks through the homogenous design of the WSE chip and its memory architecture. We discuss software support for the platform, including support by open source ML frameworks like Pytorch, and support for different types of transformer-based models. Finally, Joel shares some of the research his team is pursuing to take advantage of the hardware's unique characteristics, including weight-sparse training, optimizers that leverage higher-order statistics, and more. The complete show notes for this episode can be found at twimlai.com/go/684.
Today we're joined by Laurent Boinot, power and utilities lead for the Americas at Microsoft, to discuss the intersection of AI and energy infrastructure. We discuss the many challenges faced by current power systems in North America and the role AI is beginning to play in driving efficiencies in areas like demand forecasting and grid optimization. Laurent shares a variety of examples along the way, including some of the ways utility companies are using AI to ensure secure systems, interact with customers, navigate internal knowledge bases, and design electrical transmission systems. We also discuss the future of nuclear power, and why electric vehicles might play a critical role in American energy management. The complete show notes for this episode can be found at twimlai.com/go/683.
Today we're joined by Azarakhsh (Aza) Jalalvand, a research scholar at Princeton University, to discuss his work using deep reinforcement learning to control plasma instabilities in nuclear fusion reactors. Aza explains his team developed a model to detect and avoid a fatal plasma instability called ‘tearing mode’. Aza walks us through the process of collecting and pre-processing the complex diagnostic data from fusion experiments, training the models, and deploying the controller algorithm on the DIII-D fusion research reactor. He shares insights from developing the controller and discusses the future challenges and opportunities for AI in enabling stable and efficient fusion energy production. The complete show notes for this episode can be found at twimlai.com/go/682.
Today we're joined by Kirk Marple, CEO and founder of Graphlit, to explore the emerging paradigm of "GraphRAG," or Graph Retrieval Augmented Generation. In our conversation, Kirk digs into the GraphRAG architecture and how Graphlit uses it to offer a multi-stage workflow for ingesting, processing, retrieving, and generating content using LLMs (like GPT-4) and other Generative AI tech. He shares how the system performs entity extraction to build a knowledge graph and how graph, vector, and object storage are integrated in the system. We dive into how the system uses “prompt compilation” to improve the results it gets from Large Language Models during generation. We conclude by discussing several use cases the approach supports, as well as future agent-based applications it enables. The complete show notes for this episode can be found at twimlai.com/go/681.
Today we're joined by Alex Havrilla, a PhD student at Georgia Tech, to discuss "Teaching Large Language Models to Reason with Reinforcement Learning." Alex discusses the role of creativity and exploration in problem solving and explores the opportunities presented by applying reinforcement learning algorithms to the challenge of improving reasoning in large language models. Alex also shares his research on the effect of noise on language model training, highlighting the robustness of LLM architecture. Finally, we delve into the future of RL, and the potential of combining language models with traditional methods to achieve more robust AI reasoning. The complete show notes for this episode can be found at twimlai.com/go/680.
Today we're joined by Peter Hase, a fifth-year PhD student at the University of North Carolina NLP lab. We discuss "scalable oversight", and the importance of developing a deeper understanding of how large neural networks make decisions. We learn how matrices are probed by interpretability researchers, and explore the two schools of thought regarding how LLMs store knowledge. Finally, we discuss the importance of deleting sensitive information from model weights, and how "easy-to-hard generalization" could increase the risk of releasing open-source foundation models. The complete show notes for this episode can be found at twimlai.com/go/679.
Today we're joined by Jonas Geiping, a research group leader at the ELLIS Institute, to explore his paper: "Coercing LLMs to Do and Reveal (Almost) Anything". Jonas explains how neural networks can be exploited, highlighting the risk of deploying LLM agents that interact with the real world. We discuss the role of open models in enabling security research, the challenges of optimizing over certain constraints, and the ongoing difficulties in achieving robustness in neural networks. Finally, we delve into the future of AI security, and the need for a better approach to mitigate the risks posed by optimized adversarial attacks. The complete show notes for this episode can be found at twimlai.com/go/678.
Today we’re joined by Mido Assran, a research scientist at Meta’s Fundamental AI Research (FAIR). In this conversation, we discuss V-JEPA, a new model being billed as “the next step in Yann LeCun's vision” for true artificial reasoning. V-JEPA, the video version of Meta’s Joint Embedding Predictive Architecture, aims to bridge the gap between human and machine intelligence by training models to learn abstract concepts in a more efficient predictive manner than generative models. V-JEPA uses a novel self-supervised training approach that allows it to learn from unlabeled video data without being distracted by pixel-level detail. Mido walks us through the process of developing the architecture and explains why it has the potential to revolutionize AI. The complete show notes for this episode can be found at twimlai.com/go/677.
Today we’re joined by Sherry Yang, senior research scientist at Google DeepMind and a PhD student at UC Berkeley. In this interview, we discuss her new paper, "Video as the New Language for Real-World Decision Making,” which explores how generative video models can play a role similar to language models as a way to solve tasks in the real world. Sherry draws the analogy between natural language as a unified representation of information and text prediction as a common task interface and demonstrates how video as a medium and generative video as a task exhibit similar properties. This formulation enables video generation models to play a variety of real-world roles as planners, agents, compute engines, and environment simulators. Finally, we explore UniSim, an interactive demo of Sherry's work and a preview of her vision for interacting with AI-generated environments. The complete show notes for this episode can be found at twimlai.com/go/676.
Today we’re joined by Sayash Kapoor, a Ph.D. student in the Department of Computer Science at Princeton University. Sayash walks us through his paper: "On the Societal Impact of Open Foundation Models.” We dig into the controversy around AI safety, the risks and benefits of releasing open model weights, and how we can establish common ground for assessing the threats posed by AI. We discuss the application of the framework presented in the paper to specific risks, such as the biosecurity risk of open LLMs, as well as the growing problem of "Non Consensual Intimate Imagery" using open diffusion models. The complete show notes for this episode can be found at twimlai.com/go/675.
Today we’re joined by Akshita Bhagia, a senior research engineer at the Allen Institute for AI. Akshita joins us to discuss OLMo, a new open source language model with 7 billion and 1 billion variants, but with a key difference compared to similar models offered by Meta, Mistral, and others. Namely, the fact that AI2 has also published the dataset and key tools used to train the model. In our chat with Akshita, we dig into the OLMo models and the various projects falling under the OLMo umbrella, including Dolma, an open three-trillion-token corpus for language model pretraining, and Paloma, a benchmark and tooling for evaluating language model performance across a variety of domains. The complete show notes for this episode can be found at twimlai.com/go/674.
Today we’re joined by Ben Prystawski, a PhD student in the Department of Psychology at Stanford University working at the intersection of cognitive science and machine learning. Our conversation centers on Ben’s recent paper, “Why think step by step? Reasoning emerges from the locality of experience,” which he recently presented at NeurIPS 2023. In this conversation, we start out exploring basic questions about LLM reasoning, including whether it exists, how we can define it, and how techniques like chain-of-thought reasoning appear to strengthen it. We then dig into the details of Ben’s paper, which aims to understand why thinking step-by-step is effective and demonstrates that local structure is the key property of LLM training data that enables it. The complete show notes for this episode can be found at twimlai.com/go/673.
Today we're joined by Armineh Nourbakhsh of JP Morgan AI Research to discuss the development and capabilities of DocLLM, a layout-aware large language model for multimodal document understanding. Armineh provides a historical overview of the challenges of document AI and an introduction to the DocLLM model. Armineh explains how this model, distinct from both traditional LLMs and document AI models, incorporates both textual semantics and spatial layout in processing enterprise documents like reports and complex contracts. We dig into her team’s approach to training DocLLM, their choice of a generative model as opposed to an encoder-based approach, the datasets they used to build the model, their approach to incorporating layout information, and the various ways they evaluated the model’s performance. The complete show notes for this episode can be found at twimlai.com/go/672.
Today we’re joined by Sanmi Koyejo, assistant professor at Stanford University, to continue our NeurIPS 2024 series. In our conversation, Sanmi discusses his two recent award-winning papers. First, we dive into his paper, “Are Emergent Abilities of Large Language Models a Mirage?”. We discuss the different ways LLMs are evaluated and the excitement surrounding their“emergent abilities” such as the ability to perform arithmetic Sanmi describes how evaluating model performance using nonlinear metrics can lead to the illusion that the model is rapidly gaining new capabilities, whereas linear metrics show smooth improvement as expected, casting doubt on the significance of emergence. We continue on to his next paper, “DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models,” discussing the methodology it describes for evaluating concerns such as the toxicity, privacy, fairness, and robustness of LLMs. The complete show notes for this episode can be found at twimlai.com/go/671.
Today we’re joined by Kamyar Azizzadenesheli, a staff researcher at Nvidia, to continue our AI Trends 2024 series. In our conversation, Kamyar updates us on the latest developments in reinforcement learning (RL), and how the RL community is taking advantage of the abstract reasoning abilities of large language models (LLMs). Kamyar shares his insights on how LLMs are pushing RL performance forward in a variety of applications, such as ALOHA, a robot that can learn to fold clothes, and Voyager, an RL agent that uses GPT-4 to outperform prior systems at playing Minecraft. We also explore the progress being made in assessing and addressing the risks of RL-based decision-making in domains such as finance, healthcare, and agriculture. Finally, we discuss the future of deep reinforcement learning, Kamyar’s top predictions for the field, and how greater compute capabilities will be critical in achieving general intelligence. The complete show notes for this episode can be found at twimlai.com/go/670.
Today we’re joined by Ram Sriharsha, VP of engineering at Pinecone. In our conversation, we dive into the topic of vector databases and retrieval augmented generation (RAG). We explore the trade-offs between relying solely on LLMs for retrieval tasks versus combining retrieval in vector databases and LLMs, the advantages and complexities of RAG with vector databases, the key considerations for building and deploying real-world RAG-based applications, and an in-depth look at Pinecone's new serverless offering. Currently in public preview, Pinecone Serverless is a vector database that enables on-demand data loading, flexible scaling, and cost-effective query processing. Ram discusses how the serverless paradigm impacts the vector database’s core architecture, key features, and other considerations. Lastly, Ram shares his perspective on the future of vector databases in helping enterprises deliver RAG systems. The complete show notes for this episode can be found at twimlai.com/go/669.
Today we’re joined by Ben Zhao, a Neubauer professor of computer science at the University of Chicago. In our conversation, we explore his research at the intersection of security and generative AI. We focus on Ben’s recent Fawkes, Glaze, and Nightshade projects, which use “poisoning” approaches to provide users with security and protection against AI encroachments. The first tool we discuss, Fawkes, imperceptibly “cloaks” images in such a way that models perceive them as highly distorted, effectively shielding individuals from recognition by facial recognition models. We then dig into Glaze, a tool that employs machine learning algorithms to compute subtle alterations that are indiscernible to human eyes but adept at tricking the models into perceiving a significant shift in art style, giving artists a unique defense against style mimicry. Lastly, we cover Nightshade, a strategic defense tool for artists akin to a 'poison pill' which allows artists to apply imperceptible changes to their images that effectively “breaks” generative AI models that are trained on them. The complete show notes for this episode can be found at twimlai.com/go/668.
Today, we continue our NeurIPS series with Dan Friedman, a PhD student in the Princeton NLP group. In our conversation, we explore his research on mechanistic interpretability for transformer models, specifically his paper, Learning Transformer Programs. The LTP paper proposes modifications to the transformer architecture which allow transformer models to be easily converted into human-readable programs, making them inherently interpretable. In our conversation, we compare the approach proposed by this research with prior approaches to understanding the models and their shortcomings. We also dig into the approach’s function and scale limitations and constraints. The complete show notes for this episode can be found at twimlai.com/go/667.
Today we continue our AI Trends 2024 series with a conversation with Thomas Dietterich, distinguished professor emeritus at Oregon State University. As you might expect, Large Language Models figured prominently in our conversation, and we covered a vast array of papers and use cases exploring current research into topics such as monolithic vs. modular architectures, hallucinations, the application of uncertainty quantification (UQ), and using RAG as a sort of memory module for LLMs. Lastly, don’t miss Tom’s predictions on what he foresees happening this year as well as his words of encouragement for those new to the field. The complete show notes for this episode can be found at twimlai.com/go/666.
Today we kick off our AI Trends 2024 series with a conversation with Naila Murray, director of AI research at Meta. In our conversation with Naila, we dig into the latest trends and developments in the realm of computer vision. We explore advancements in the areas of controllable generation, visual programming, 3D Gaussian splatting, and multimodal models, specifically vision plus LLMs. We discuss tools and open source projects, including Segment Anything–a tool for versatile zero-shot image segmentation using simple text prompts clicks, and bounding boxes; ControlNet–which adds conditional control to stable diffusion models; and DINOv2–a visual encoding model enabling object recognition, segmentation, and depth estimation, even in data-scarce scenarios. Finally, Naila shares her view on the most exciting opportunities in the field, as well as her predictions for upcoming years. The complete show notes for this episode can be found at twimlai.com/go/665.
Today we’re joined by Ed Anuff, chief product officer at DataStax. In our conversation, we discuss Ed’s insights on RAG, vector databases, embedding models, and more. We dig into the underpinnings of modern vector databases (like HNSW and DiskANN) that allow them to efficiently handle massive and unstructured data sets, and discuss how they help users serve up relevant results for RAG, AI assistants, and other use cases. We also discuss embedding models and their role in vector comparisons and database retrieval as well as the potential for GPU usage to enhance vector database performance. The complete show notes for this episode can be found at twimlai.com/go/664.
Today we’re joined by Markus Nagel, research scientist at Qualcomm AI Research, who helps us kick off our coverage of NeurIPS 2023. In our conversation with Markus, we cover his accepted papers at the conference, along with other work presented by Qualcomm AI Research scientists. Markus’ first paper, Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing, focuses on tackling activation quantization issues introduced by the attention mechanism and how to solve them. We also discuss Pruning vs Quantization: Which is Better?, which focuses on comparing the effectiveness of these two methods in achieving model weight compression. Additional papers discussed focus on topics like using scalarization in multitask and multidomain learning to improve training and inference, using diffusion models for a sequence of state models and actions, applying geometric algebra with equivariance to transformers, and applying a deductive verification of chain of thought reasoning performed by LLMs. The complete show notes for this episode can be found at twimlai.com/go/663.
Today we’re joined by Michael Kearns, professor in the Department of Computer and Information Science at the University of Pennsylvania and an Amazon scholar. In our conversation with Michael, we discuss the new challenges to responsible AI brought about by the generative AI era. We explore Michael’s learnings and insights from the intersection of his real-world experience at AWS and his work in academia. We cover a diverse range of topics under this banner, including service card metrics, privacy, hallucinations, RLHF, and LLM evaluation benchmarks. We also touch on Clean Rooms ML, a secured environment that balances accessibility to private datasets through differential privacy techniques, offering a new approach for secure data handling in machine learning. The complete show notes for this episode can be found at twimlai.com/go/662.
Today we’re joined by Mike Miller, director of product at AWS responsible for the company’s “edutainment” products. In our conversation with Mike, we explore AWS PartyRock, a no-code generative AI app builder that allows users to easily create fun and shareable AI applications by selecting a model, chaining prompts together, and linking different text, image, and chatbot widgets together. Additionally, we discuss some of the previous tools Mike’s team has delivered at the intersection of developer education and entertainment, including DeepLens, a computer vision hardware device, DeepRacer, a programmable vehicle that uses reinforcement learning to navigate a track, and lastly, DeepComposer, a generative AI model that transforms musical inputs and creates accompanying compositions. The complete show notes for this episode can be found at twimlai.com/go/661.
Today we’re joined by Cody Coleman, co-founder and CEO of Coactive AI. In our conversation with Cody, we discuss how Coactive has leveraged modern data, systems, and machine learning techniques to deliver its multimodal asset platform and visual search tools. Cody shares his expertise in the area of data-centric AI, and we dig into techniques like active learning and core set selection, and how they can drive greater efficiency throughout the machine learning lifecycle. We explore the various ways Coactive uses multimodal embeddings to enable their core visual search experience, and we cover the infrastructure optimizations they’ve implemented in order to scale their systems. We conclude with Cody’s advice for entrepreneurs and engineers building companies around generative AI technologies. The complete show notes for this episode can be found at twimlai.com/go/660.
Today we’re joined by Kyle Roche, founder and CEO of Griptape to discuss patterns and middleware for LLM applications. We dive into the emerging patterns for developing LLM applications, such as off prompt data—which allows data retrieval without compromising the chain of thought within language models—and pipelines, which are sequential tasks that are given to LLMs that can involve different models for each task or step in the pipeline. We also explore Griptape, an open-source, Python-based middleware stack that aims to securely connect LLM applications to an organization’s internal and external data systems. We discuss the abstractions it offers, including drivers, memory management, rule sets, DAG-based workflows, and a prompt stack. Additionally, we touch on common customer concerns such as privacy, retraining, and sovereignty issues, and several use cases that leverage role-based retrieval methods to optimize human augmentation tasks. The complete show notes for this episode can be found at twimlai.com/go/659.
Today we’re joined by Prem Natarajan, chief scientist and head of enterprise AI at Capital One. In our conversation, we discuss AI access and inclusivity as technical challenges and explore some of Prem and his team’s multidisciplinary approaches to tackling these complexities. We dive into the issues of bias, dealing with class imbalances, and the integration of various research initiatives to achieve additive results. Prem also shares his team’s work on foundation models for financial data curation, highlighting the importance of data quality and the use of federated learning, and emphasizing the impact these factors have on the model performance and reliability in critical applications like fraud detection. Lastly, Prem shares his overall approach to tackling AI research in the context of a banking enterprise, including prioritizing mission-inspired research aiming to deliver tangible benefits to customers and the broader community, investing in diverse talent and the best infrastructure, and forging strategic partnerships with a variety of academic labs. The complete show notes for this episode can be found at twimlai.com/go/658.
Today we’re joined by Jay Emery, director of technical sales & architecture at Microsoft Azure. In our conversation with Jay, we discuss the challenges faced by organizations when building LLM-based applications, and we explore some of the techniques they are using to overcome them. We dive into the concerns around security, data privacy, cost management, and performance as well as the ability and effectiveness of prompting to achieve the desired results versus fine-tuning, and when each approach should be applied. We cover methods such as prompt tuning and prompt chaining, prompt variance, fine-tuning, and RAG to enhance LLM output along with ways to speed up inference performance such as choosing the right model, parallelization, and provisioned throughput units (PTUs). In addition to that, Jay also shared several intriguing use cases describing how businesses use tools like Azure Machine Learning prompt flow and Azure ML AI Studio to tailor LLMs to their unique needs and processes. The complete show notes for this episode can be found at twimlai.com/go/657.
Today we’re joined by Richard Zhang, senior research scientist at Adobe Research. In our conversation with Richard, we explore the research challenges that arise when regarding visual generative AI from an ecosystem perspective, considering the disparate needs of creators, consumers, and contributors. We start with his work on perceptual metrics and the LPIPS paper, which allow us to better align human perception and computer vision and which remain used in contemporary generative AI applications such as stable diffusion, GANs, and latent diffusion. We look at his work creating detection tools for fake visual content, highlighting the importance of generalization of these detection methods to new, unseen models. Lastly, we dig into his work on data attribution and concept ablation, which aim to address the challenging open problem of allowing artists and others to manage their contributions to generative AI training data sets. The complete show notes for this episode can be found at twimlai.com/go/656.
Today we’re joined by Heather Gorr, principal MATLAB product marketing manager at MathWorks. In our conversation with Heather, we discuss the deployment of AI models to hardware devices and embedded AI systems. We explore factors to consider during data preparation, model development, and ultimately deployment, to ensure a successful project. Factors such as device constraints and latency requirements which dictate the amount and frequency of data flowing onto the device are discussed, as are modeling needs such as explainability, robustness and quantization; the use of simulation throughout the modeling process; the need to apply robust verification and validation methodologies to ensure safety and reliability; and the need to adapt and apply MLOps techniques for speed and consistency. Heather also shares noteworthy anecdotes about embedded AI deployments in industries including automotive and oil & gas. The complete show notes for this episode can be found at twimlai.com/go/655.
Today we’re joined by Yoshua Bengio, professor at Université de Montréal. In our conversation with Yoshua, we discuss AI safety and the potentially catastrophic risks of its misuse. Yoshua highlights various risks and the dangers of AI being used to manipulate people, spread disinformation, cause harm, and further concentrate power in society. We dive deep into the risks associated with achieving human-level competence in enough areas with AI, and tackle the challenges of defining and understanding concepts like agency and sentience. Additionally, our conversation touches on solutions to AI safety, such as the need for robust safety guardrails, investments in national security protections and countermeasures, bans on systems with uncertain safety, and the development of governance-driven AI systems. The complete show notes for this episode can be found at twimlai.com/go/654.
Today we’re joined by Miriam Friedel, senior director of ML engineering at Capital One. In our conversation with Miriam, we discuss some of the challenges faced when delivering machine learning tools and systems in highly regulated enterprise environments, and some of the practices her teams have adopted to help them operate with greater speed and agility. We also explore how to create a culture of collaboration, the value of standardized tooling and processes, leveraging open-source, and incentivizing model reuse. Miriam also shares her thoughts on building a ‘unicorn’ team, and what this means for the team she’s built at Capital One, as well as her take on build vs. buy decisions for MLOps, and the future of MLOps and enterprise AI more broadly. Throughout, Miriam shares examples of these ideas at work in some of the tools their team has built, such as Rubicon, an open source experiment management tool, and Kubeflow pipeline components that enable Capital One data scientists to efficiently leverage and scale models. The complete show notes for this episode can be found at twimlai.com/go/653.
Today we’re joined by Riley Goodside, staff prompt engineer at Scale AI. In our conversation with Riley, we explore LLM capabilities and limitations, prompt engineering, and the mental models required to apply advanced prompting techniques. We dive deep into understanding LLM behavior, discussing the mechanism of autoregressive inference, comparing k-shot and zero-shot prompting, and dissecting the impact of RLHF. We also discuss the idea that prompting is a scaffolding structure that leverages the model context, resulting in achieving the desired model behavior and response rather than focusing solely on writing ability. The complete show notes for this episode can be found at twimlai.com/go/652.
Today we’re joined by Sara Hooker, director at Cohere and head of Cohere For AI, Cohere’s research lab. In our conversation with Sara, we explore some of the challenges with multilingual models like poor data quality and tokenization, and how they rely on data augmentation and preference training to address these bottlenecks. We also discuss the disadvantages and the motivating factors behind the Mixture of Experts technique, and the importance of common language between ML researchers and hardware architects to address the pain points in frameworks and create a better cohesion between the distinct communities. Sara also highlights the impact and the emotional connection that language models have created in society, the benefits and the current safety concerns of universal models, and the significance of having grounded conversations to characterize and mitigate the risk and development of AI models. Along the way, we also dive deep into Cohere and Cohere for AI, along with their Aya project, an open science project that aims to build a state-of-the-art multilingual generative language model as well as some of their recent research papers. The complete show notes for this episode can be found at twimlai.com/go/651.
Today we’re joined by Luke Zettlemoyer, professor at University of Washington and a research manager at Meta. In our conversation with Luke, we cover multimodal generative AI, the effect of data on models, and the significance of open source and open science. We explore the grounding problem, the need for visual grounding and embodiment in text-based models, the advantages of discretization tokenization in image generation, and his paper Scaling Laws for Generative Mixed-Modal Language Models, which focuses on simultaneously training LLMs on various modalities. Additionally, we cover his papers on Self-Alignment with Instruction Backtranslation, and LIMA: Less Is More for Alignment. The complete show notes for this episode can be found at twimlai.com/go/650.
Today we’re joined by Alex Hanna, the Director of Research at the Distributed AI Research Institute (DAIR). In our conversation with Alex, we discuss the topic of AI hype and the importance of tackling the issues and impacts it has on society. Alex highlights how the hype cycle started, concerning use cases, incentives driving people towards the rapid commercialization of AI tools, and the need for robust evaluation tools and frameworks to assess and mitigate the risks of these technologies. We also talked about DAIR and how they’ve crafted their research agenda. We discuss current research projects like DAIR Fellow Asmelash Teka Hadgu’s research supporting machine translation and speech recognition tools for the low-resource Amharic and Tigrinya languages of Ethiopia and Eritrea, in partnership with his startup Lesan.AI. We also explore the “Do Data Sets Have Politics” paper, which focuses on coding various variables and conducting a qualitative analysis of computer vision data sets to uncover the inherent politics present in data sets and the challenges in data set creation. The complete show notes for this episode can be found at twimlai.com/go/649.
Today we’re joined by Nataniel Ruiz, a research scientist at Google. In our conversation with Nataniel, we discuss his recent work around personalization for text-to-image AI models. Specifically, we dig into DreamBooth, an algorithm that enables “subject-driven generation,” that is, the creation of personalized generative models using a small set of user-provided images about a subject. The personalized models can then be used to generate the subject in various contexts using a text prompt. Nataniel gives us a dive deep into the fine-tuning approach used in DreamBooth, the potential reasons behind the algorithm’s effectiveness, the challenges of fine-tuning diffusion models in this way, such as language drift, and how the prior preservation loss technique avoids this setback, as well as the evaluation challenges and metrics used in DreamBooth. We also touched base on his other recent papers including SuTI, StyleDrop, HyperDreamBooth, and lastly, Platypus. The complete show notes for this episode can be found at twimlai.com/go/648.
Today we’re joined by Shreya Rajpal, founder and CEO of Guardrails AI. In our conversation with Shreya, we discuss ensuring the safety and reliability of language models for production applications. We explore the risks and challenges associated with these models, including different types of hallucinations and other LLM failure modes. We also talk about the susceptibility of the popular retrieval augmented generation (RAG) technique to closed-domain hallucination, and how this challenge can be addressed. We also cover the need for robust evaluation metrics and tooling for building with large language models. Lastly, we explore Guardrails, an open-source project that provides a catalog of validators that run on top of language models to enforce correctness and reliability efficiently. The complete show notes for this episode can be found at twimlai.com/go/647.
Today we’re joined by Roland Memisevic, a senior director at Qualcomm AI Research. In our conversation with Roland, we discuss the significance of language in humanlike AI systems and the advantages and limitations of autoregressive models like Transformers in building them. We cover the current and future role of recurrence in LLM reasoning and the significance of improving grounding in AI—including the potential of developing a sense of self in agents. Along the way, we discuss Fitness Ally, a fitness coach trained on a visually grounded large language model, which has served as a platform for Roland’s research into neural reasoning, as well as recent research that explores topics like visual grounding for large language models and state-augmented architectures for AI agents. The complete show notes for this episode can be found at twimlai.com/go/646.
Today we’re joined by James Zou, an assistant professor at Stanford University. In our conversation with James, we explore the differences in ChatGPT’s behavior over the last few months. We discuss the issues that can arise from inconsistencies in generative AI models, how he tested ChatGPT’s performance in various tasks, drawing comparisons between March 2023 and June 2023 for both GPT-3.5 and GPT-4 versions, and the possible reasons behind the declining performance of these models. James also shared his thoughts on how surgical AI editing akin to CRISPR could potentially revolutionize LLM and AI systems, and how adding monitoring tools can help in tracking behavioral changes in these models. Finally, we discuss James' recent paper on pathology image analysis using Twitter data, in which he explores the challenges of obtaining large medical datasets and data collection, as well as detailing the model’s architecture, training, and the evaluation process. The complete show notes for this episode can be found at twimlai.com/go/645.
Today we’re joined by Sophia Sanborn, a postdoctoral scholar at the University of California, Santa Barbara. In our conversation with Sophia, we explore the concept of universality between neural representations and deep neural networks, and how these principles of efficiency provide an ability to find consistent features across networks and tasks. We also discuss her recent paper on Bispectral Neural Networks which focuses on Fourier transform and its relation to group theory, the implementation of bi-spectral spectrum in achieving invariance in deep neural networks, the expansion of geometric deep learning on the concept of CNNs from other domains, the similarities in the fundamental structure of artificial neural networks and biological neural networks and how applying similar constraints leads to the convergence of their solutions. The complete show notes for this episode can be found at twimlai.com/go/644.
Today we’re joined by Gokul Swamy, a Ph.D. Student at the Robotics Institute at Carnegie Mellon University. In the final conversation of our ICML 2023 series, we sat down with Gokul to discuss his accepted papers at the event, leading off with “Inverse Reinforcement Learning without Reinforcement Learning.” In this paper, Gokul explores the challenges and benefits of inverse reinforcement learning, and the potential and advantages it holds for various applications. Next up, we explore the “Complementing a Policy with a Different Observation Space” paper which applies causal inference techniques to accurately estimate sampling balance and make decisions based on limited observed features. Finally, we touched on “Learning Shared Safety Constraints from Multi-task Demonstrations” which centers on learning safety constraints from demonstrations using the inverse reinforcement learning approach. The complete show notes for this episode can be found at twimlai.com/go/643.
Today we’re joined by Su-In Lee, a professor at the Paul G. Allen School of Computer Science And Engineering at the University Of Washington. In our conversation, Su-In details her talk from the ICML 2023 Workshop on Computational Biology which focuses on developing explainable AI techniques for the computational biology and clinical medicine fields. Su-In discussed the importance of explainable AI contributing to feature collaboration, the robustness of different explainability approaches, and the need for interdisciplinary collaboration between the computer science, biology, and medical fields. We also explore her recent paper on the use of drug combination therapy, challenges with handling biomedical data, and how they aim to make meaningful contributions to the healthcare industry by aiding in cause identification and treatments for Cancer and Alzheimer's diseases. The complete show notes for this episode can be found at twimlai.com/go/642.
Today we’re joined by Bayan Bruss, Vice President of Applied ML Research at Capital One. In our conversation with Bayan, we covered a pair of papers his team presented at this year’s ICML conference. We begin with the paper Interpretable Subspaces in Image Representations, where Bayan gives us a dive deep into the interpretability framework, embedding dimensions, contrastive approaches, and how their model can accelerate image representation in deep learning. We also explore GOAT: A Global Transformer on Large-scale Graphs, a scalable global graph transformer. We talk through the computation challenges, homophilic and heterophilic principles, model sparsity, and how their research proposes methodologies to get around the computational barrier when scaling to large-scale graph models. The complete show notes for this episode can be found at twimlai.com/go/641.
Today we’re joined by Atul Deo, General Manager of Amazon Bedrock. In our conversation with Atul, we discuss the process of training large language models in the enterprise, including the pain points of creating and training machine learning models, and the power of pre-trained models. We explore different approaches to how companies can leverage large language models, dealing with the hallucination, and the transformative process of retrieval augmented generation (RAG). Finally, Atul gives us an inside look at Bedrock, a fully managed service that simplifies the deployment of generative AI-based apps at scale. The complete show notes for this episode can be found at twimlai.com/go/640.
Today we’re joined by David Rosenberg, head of the machine learning strategy team in the Office of the CTO at Bloomberg. In our conversation with David, we discuss the creation of BloombergGPT, a custom-built LLM focused on financial applications. We explore the model’s architecture, validation process, benchmarks, and its distinction from other language models. David also discussed the evaluation process, performance comparisons, progress, and the future directions of the model. Finally, we discuss the ethical considerations that come with building these types of models, and how they've approached dealing with these issues. The complete show notes for this episode can be found at twimlai.com/go/639
Today we’re joined by Robert Osazuwa Ness, a senior researcher at Microsoft Research, Professor at Northeastern University, and Founder of Altdeep.ai. In our conversation with Robert, we explore whether large language models, specifically GPT-3, 3.5, and 4, are good at causal reasoning. We discuss the benchmarks used to evaluate these models and the limitations they have in answering specific causal reasoning questions, while Robert highlights the need for access to weights, training data, and architecture to correctly answer these questions. The episode discusses the challenge of generalization in causal relationships and the importance of incorporating inductive biases, explores the model's ability to generalize beyond the provided benchmarks, and the importance of considering causal factors in decision-making processes. The complete show notes for this episode can be found at twimlai.com/go/638.
Today we’re joined by Alice Xiang, Lead Research Scientist at Sony AI, and Global Head of AI Ethics at Sony Group Corporation. In our conversation with Alice, we discuss the ongoing debate between privacy and fairness in computer vision, diving into the impact of data privacy laws on the AI space while highlighting concerns about unauthorized use and lack of transparency in data usage. We explore the potential harm of inaccurate AI model outputs and the need for legal protection against biased AI products, and Alice suggests various solutions to address these challenges, such as working through third parties for data collection and establishing closer relationships with communities. Finally, we talk through the history of unethical data collection practices in CV and the emergence of generative AI technologies that exacerbate the problem, the importance of operationalizing ethical data collection and practice, including appropriate consent, representation, diversity, and compensation, and the need for interdisciplinary collaboration in AI ethics and the growing interest in AI regulation, including the EU AI Act and regulatory activities in the US. The complete show notes for this episode can be found at twimlai.com/go/637.
Today we're joined by Mohit Bansal, Parker Professor, and Director of the MURGe-Lab at UNC, Chapel Hill. In our conversation with Mohit, we explore the concept of unification in AI models, highlighting the advantages of shared knowledge and efficiency. He addresses the challenges of evaluation in generative AI, including biases and spurious correlations. Mohit introduces groundbreaking models such as UDOP and VL-T5, which achieved state-of-the-art results in various vision and language tasks while using fewer parameters. Finally, we discuss the importance of data efficiency, evaluating bias in models, and the future of multimodal models and explainability. The complete show notes for this episode can be found at twimlai.com/go/636.
Today we kick off our coverage of the 2023 CVPR conference joined by Fatih Porikli, a Senior Director of Technology at Qualcomm. In our conversation with Fatih, we covered quite a bit of ground, touching on a total of 12 papers/demos, focusing on topics like data augmentation and optimized architectures for computer vision. We explore advances in optical flow estimation networks, cross-model, and stage knowledge distillation for efficient 3D object detection, and zero-shot learning via language models for fine-grained labeling. We also discuss generative AI advancements and computer vision optimization for running large models on edge devices. Finally, we discuss objective functions, architecture design choices for neural networks, and efficiency and accuracy improvements in AI models via the techniques introduced in the papers.
Today we’re joined by Chris Lattner, Co-Founder and CEO of Modular. In our conversation with Chris, we discuss Mojo, a new programming language for AI developers. Mojo is unique in this space and simplifies things by making the entire stack accessible and understandable to people who are not compiler engineers. It also offers Python programmers the ability to make it high-performance and capable of running accelerators, making it more accessible to more people and researchers. We discuss the relationship between the Modular Engine and Mojo, the challenge of packaging Python, particularly when incorporating C code, and how Mojo aims to solve these problems to make the AI stack more dependable. The complete show notes for this episode can be found at twimlai.com/go/634
Today we’re joined by Jilei Hou, a VP of Engineering at Qualcomm Technologies. In our conversation with Jilei, we focus on the emergence of generative AI, and how they've worked towards providing these models for use on edge devices. We explore how the distribution of models on devices can help amortize large models' costs while improving reliability and performance and the challenges of running machine learning workloads on devices, including model size and inference latency. Finally, Jilei we explore how these emerging technologies fit into the existing AI Model Efficiency Toolkit (AIMET) framework. The complete show notes for this episode can be found at twimlai.com/go/633
Today we’re joined by Joon Sung Park, a PhD Student at Stanford University. Joon shares his passion for creating AI systems that can solve human problems and his work on the recent paper Generative Agents: Interactive Simulacra of Human Behavior, which showcases generative agents that exhibit believable human behavior. We discuss using empirical methods to study these systems and the conflicting papers on whether AI models have a worldview and common sense. Joon talks about the importance of context and environment in creating believable agent behavior and shares his team's work on scaling emerging community behaviors. He also dives into the importance of a long-term memory module in agents and the use of knowledge graphs in retrieving associative information. The goal, Joon explains, is to create something that people can enjoy and empower people, solving existing problems and challenges in the traditional HCI and AI field.
Today we’re joined by Hugo Larochelle, a research scientist at Google Deepmind. In our conversation with Hugo, we discuss his work on transfer learning, understanding the capabilities of deep learning models, and creating the Transactions on Machine Learning Research journal. We explore the use of large language models in NLP, prompting, and zero-shot learning. Hugo also shares insights from his research on neural knowledge mobilization for code completion and discusses the adaptive prompts used in their system. The complete show notes for this episode can be found at twimlai.com/go/631.
Today we’re joined by Dan Fu, a PhD student at Stanford University. In our conversation with Dan, we discuss the limitations of state space models in language modeling and the search for alternative building blocks that can help increase context length without being computationally infeasible. Dan walks us through the H3 architecture and Flash Attention technique, which can reduce the memory footprint of a model and make it feasible to fine-tune. We also explore his work on improving language models using synthetic languages, the issue of long sequence length affecting both training and inference in models, and the hope for finding something sub-quadratic that can perform language processing more effectively than the brute force approach of attention. The complete show notes for this episode can be found at https://twimlai.com/go/630
Today we continue our coverage of ICLR 2023 joined by Dhruv Batra, an associate professor at Georgia Tech and research director of the Fundamental AI Research (FAIR) team at META. In our conversation, we discuss Dhruv’s work on the paper Emergence of Maps in the Memories of Blind Navigation Agents, which won an Outstanding Paper Award at the event. We explore navigation with multilayer LSTM and the question of whether embodiment is necessary for intelligence. We delve into the Embodiment Hypothesis and the progress being made in language models and caution on the responsible use of these models. We also discuss the history of AI and the importance of using the right data sets in training. The conversation explores the different meanings of "maps" across AI and cognitive science fields, Dhruv’s experience in navigating mapless systems, and the early discovery stages of memory representation and neural mechanisms. The complete show notes for this episode can be found at https://twimlai.com/go/629
Today we’re joined by Jerry Liu, co-founder and CEO of Llama Index. In our conversation with Jerry, we explore the creation of Llama Index, a centralized interface to connect your external data with the latest large language models. We discuss the challenges of adding private data to language models and how Llama Index connects the two for better decision-making. We discuss the role of agents in automation, the evolution of the agent abstraction space, and the difficulties of optimizing queries over large amounts of complex data. We also discuss a range of topics from combining summarization and semantic search, to automating reasoning, to improving language model results by exploiting relationships between nodes in data. The complete show notes for this episode can be found at twimlai.com/go/628.
Today we kick off our coverage of the 2023 ICLR conference joined by Christos Louizos, an ML researcher at Qualcomm Technologies. In our conversation with Christos, we explore his paper Hyperparameter Optimization through Neural Network Partitioning and a few of his colleague's works from the conference. We discuss methods for speeding up attention mechanisms in transformers, scheduling operations for computation graphs, estimating channels in indoor environments, and adapting to distribution shifts in test time with neural network modules. We also talk through the benefits and limitations of federated learning, exploring sparse models, optimizing communication between servers and devices, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/627.
Today we’re joined by Marti Hearst, Professor at UC Berkeley. In our conversation with Marti, we explore the intricacies of AI language models and their usefulness in improving efficiency but also their potential for spreading misinformation. Marti expresses skepticism about whether these models truly have cognition compared to the nuance of the human brain. We discuss the intersection of language and visualization and the need for specialized research to ensure safety and appropriateness for specific uses. We also delve into the latest tools and algorithms such as Copilot and Chat GPT, which enhance programming and help in identifying comparisons, respectively. Finally, we discuss Marti’s long research history in search and her breakthrough in developing a standard interaction that allows for finding items on websites and library catalogs. The complete show notes for this episode can be found at https://twimlai.com/go/626.
Today we’re joined by Ben Goertzel, CEO of SingularityNET. In our conversation with Ben, we explore all things AGI, including the potential scenarios that could arise with the advent of AGI and his preference for a decentralized rollout comparable to the internet or Linux. Ben shares his research in bridging neural nets, symbolic logic engines, and evolutionary programming engines to develop a common mathematical framework for AI paradigms. We also discuss the limitations of Large Language Models and the potential of hybridizing LLMs with other AGI approaches. Additionally, we chat about their work using LLMs for music generation and the limitations of formalizing creativity. Finally, Ben discusses his team's work with the OpenCog Hyperon framework and Simuli to achieve AGI, and the potential implications of their research in the future. The complete show notes for this episode can be found at https://twimlai.com/go/625
Today we’re joined by Jeff Boudier, head of product at Hugging Face 🤗. In our conversation with Jeff, we explore the current landscape of open-source machine learning tools and models, the recent shift towards consumer-focused releases, and the importance of making ML tools accessible. We also discuss the growth of the Hugging Face Hub, which currently hosts over 150k models, and how formalizing their collaboration with AWS will help drive the adoption of open-source models in the enterprise. The complete show notes for this episode can be found at twimlai.com/go/624
Today we’re joined by Vinesh Sukumar, a senior director and head of AI/ML product management at Qualcomm Technologies. In our conversation with Vinesh, we explore how mobile and automotive devices have different requirements for AI models and how their AI stack helps developers create complex models on both platforms. We also discuss the growing interest in text-based input and the shift towards transformers, generative content, and recommendation engines. Additionally, we explore the challenges and opportunities for ML Ops investments on the edge, including the use of synthetic data and evolving models based on user data. Finally, we delve into the latest advancements in large language models, including Prometheus-style models and GPT-4. The complete show notes for this episode can be found at twimlai.com/go/623.
Today we’re joined by Anastasis Germanidis, Co-Founder and CTO of RunwayML. Amongst all the product and model releases over the past few months, Runway threw its hat into the ring with Gen-1, a model that can take still images or video and transform them into completely stylized videos. They followed that up just a few weeks later with the release of Gen-2, a multimodal model that can produce a video from text prompts. We had the pleasure of chatting with Anastasis about both models, exploring the challenges of generating video, the importance of alignment in model deployment, the potential use of RLHF, the deployment of models as APIs, and much more! The complete show notes for this episode can be found at twimlai.com/go/622.
Today we’re joined by Tom Goldstein, an associate professor at the University of Maryland. Tom’s research sits at the intersection of ML and optimization and has previously been featured in the New Yorker for his work on invisibility cloaks, clothing that can evade object detection. In our conversation, we focus on his more recent research on watermarking LLM output. We explore the motivations behind adding these watermarks, how they work, and different ways a watermark could be deployed, as well as political and economic incentive structures around the adoption of watermarking and future directions for that line of work. We also discuss Tom’s research into data leakage, particularly in stable diffusion models, work that is analogous to recent guest Nicholas Carlini’s research into LLM data extraction.
Today we’re joined by Anna Ivanova, a postdoctoral researcher at MIT Quest for Intelligence. In our conversation with Anna, we discuss her recent paper Dissociating language and thought in large language models: a cognitive perspective. In the paper, Anna reviews the capabilities of LLMs by considering their performance on two different aspects of language use: 'formal linguistic competence', which includes knowledge of rules and patterns of a given language, and 'functional linguistic competence', a host of cognitive abilities required for language understanding and use in the real world. We explore parallels between linguistic competence and AGI, the need to identify new benchmarks for these models, whether an end-to-end trained LLM can address various aspects of functional competence, and much more! The complete show notes for this episode can be found at twimlai.com/go/620.
Today we’re joined by Monroe Kennedy III, an assistant professor at Stanford, director of the Assistive Robotics and Manipulation Lab, and a national director of Black in Robotics. In our conversation with Monroe, we spend some time exploring the robotics landscape, getting Monroe’s thoughts on the current challenges in the field, as well as his opinion on choreographed demonstrations like the dancing Boston Robotics machines. We also dig into his work around two distinct threads, Robotic Dexterity, (what does it take to make robots capable of doing manipulation useful tasks with and for humans?) and Collaborative Robotics (how do we go beyond advanced autonomy in robots towards making effective robotic teammates capable of working with human counterparts?). Finally, we discuss DenseTact, an optical-tactile sensor capable of visualizing the deformed surface of a soft fingertip and using that image in a neural network to perform calibrated shape reconstruction and 6-axis wrench estimation. The complete show notes for this episode can be found at twimlai.com/go/619.
Today we’re joined by Nicholas Carlini, a research scientist at Google Brain. Nicholas works at the intersection of machine learning and computer security, and his recent paper “Extracting Training Data from LLMs” has generated quite a buzz within the ML community. In our conversation, we discuss the current state of adversarial machine learning research, the dynamic of dealing with privacy issues in black box vs accessible models, what privacy attacks in vision models like diffusion models look like, and the scale of “memorization” within these models. We also explore Nicholas’ work on data poisoning, which looks to understand what happens if a bad actor can take control of a small fraction of the data that an ML model is trained on. The complete show notes for this episode can be found at twimlai.com/go/618.
Today we’re joined by Vinodkumar Prabhakaran, a Senior Research Scientist at Google Research. In our conversation with Vinod, we discuss his two main areas of research, using ML, specifically NLP, to explore these social disparities, and how these same social disparities are captured and propagated within machine learning tools. We explore a few specific projects, the first using NLP to analyze interactions between police officers and community members, determining factors like level of respect or politeness and how they play out across a spectrum of community members. We also discuss his work on understanding how bias creeps into the pipeline of building ML models, whether it be from the data or the person building the model. Finally, for those working with human annotators, Vinod shares his thoughts on how to incorporate principles of fairness to help build more robust models. The complete show notes for this episode can be found at https://twimlai.com/go/617.
Today we’re joined by Robert Osazuwa Ness, a senior researcher at Microsoft Research, to break down the latest trends in the world of causal modeling. In our conversation with Robert, we explore advances in areas like causal discovery, causal representation learning, and causal judgements. We also discuss the impact causality could have on large language models, especially in some of the recent use cases we’ve seen like Bing Search and ChatGPT. Finally, we discuss the benchmarks for causal modeling, the top causality use cases, and the most exciting opportunities in the field. The complete show notes for this episode can be found at twimlai.com/go/616.
Today we’re joined by Dimitris Zermas, a principal scientist at agriscience company Sentera. Dimitris’ work at Sentera is focused on developing tools for precision agriculture using machine learning, including hardware like cameras and sensors, as well as ML models for analyzing the vast amount of data they acquire. We explore some specific use cases for machine learning, including plant counting, the challenges of working with classical computer vision techniques, database management, and data annotation. We also discuss their use of approaches like zero-shot learning and how they’ve taken advantage of a data-centric mindset when building a better, more cost-efficient product.
Today we’re joined by Anima Anandkumar, Bren Professor of Computing And Mathematical Sciences at Caltech and Sr Director of AI Research at NVIDIA. In our conversation, we take a broad look at the emerging field of AI for Science, focusing on both practical applications and longer-term research areas. We discuss the latest developments in the area of protein folding, and how much it has evolved since we first discussed it on the podcast in 2018, the impact of generative models and stable diffusion on the space, and the application of neural operators. We also explore the ways in which prediction models like weather models could be improved, how foundation models are helping to drive innovation, and finally, we dig into MineDojo, a new framework built on the popular Minecraft game for embodied agent research, which won a 2022 Outstanding Paper Award at NeurIPS. The complete show notes for this episode can be found at twimlai.com/go/614
Today we continue our AI Trends 2023 series joined by Sameer Singh, an associate professor in the department of computer science at UC Irvine and fellow at the Allen Institute for Artificial Intelligence (AI2). In our conversation with Sameer, we focus on the latest and greatest advancements and developments in the field of NLP, starting out with one that took the internet by storm just a few short weeks ago, ChatGPT. We also explore top themes like decomposed reasoning, causal modeling in NLP, and the need for “clean” data. We also discuss projects like HuggingFace’s BLOOM, the debacle that was the Galactica demo, the impending intersection of LLMs and search, use cases like Copilot, and of course, we get Sameer’s predictions for what will happen this year in the field. The complete show notes for this episode can be found at twimlai.com/go/613.
Today we’re taking a deep dive into the latest and greatest in the world of Reinforcement Learning with our friend Sergey Levine, an associate professor, at UC Berkeley. In our conversation with Sergey, we explore some game-changing developments in the field including the release of ChatGPT and the onset of RLHF. We also explore more broadly the intersection of RL and language models, as well as advancements in offline RL and pre-training for robotics models, inverse RL, Q learning, and a host of papers along the way. Finally, you don’t want to miss Sergey’s predictions for the top developments of the year 2023! The complete show notes for this episode can be found at twimlai.com/go/612
Today we conclude our coverage of the 2022 NeurIPS series joined by Catherine Nakalembe, an associate research professor at the University of Maryland, and Africa Program Director under NASA Harvest. In our conversation with Catherine, we take a deep dive into her talk from the ML in the Physical Sciences workshop, Supporting Food Security in Africa using Machine Learning and Earth Observations. We discuss the broad challenges associated with food insecurity, as well as Catherine’s role and the priorities of Harvest Africa, a program focused on advancing innovative satellite-driven methods to produce automated within-season crop type and crop-specific condition products that support agricultural assessments. We explore some of the technical challenges of her work, including the limited, but growing, access to remote sensing and earth observation datasets and how the availability of that data has changed in recent years, the lack of benchmarks for the tasks she’s working on, examples of how they’ve applied techniques like multi-task learning and task-informed meta-learning, and much more. The complete show notes for this episode can be found at twimlai.com/go/611.
Today we conclude our AWS re:Invent 2022 series joined by Michael Kearns, a professor in the department of computer and information science at UPenn, as well as an Amazon Scholar. In our conversation, we briefly explore Michael’s broader research interests in responsible AI and ML governance and his role at Amazon. We then discuss the announcement of service cards, and their take on “model cards” at a holistic, system level as opposed to an individual model level. We walk through the information represented on the cards, as well as explore the decision-making process around specific information being omitted from the cards. We also get Michael’s take on the years-old debate of algorithmic bias vs dataset bias, what some of the current issues are around this topic, and what research he has seen (and hopes to see) addressing issues of “fairness” in large language models. The complete show notes for this episode can be found at twimlai.com/go/610.
Today we continue our NeurIPS 2022 series joined by Tony Jebara, VP of engineering and head of machine learning at Spotify. In our conversation with Tony, we discuss his role at Spotify and how the company’s use of machine learning has evolved over the last few years, and the business value of machine learning, specifically recommendations, hold at the company. We dig into his talk on the intersection of reinforcement learning and lifetime value (LTV) at Spotify, which explores the application of Offline RL for user experience personalization. We discuss the various papers presented in the talk, and how they all map toward determining and increasing a user’s LTV. The complete show notes for this episode can be found at twimlai.com/go/609.
More than any system before it, ChatGPT has tapped into our enduring fascination with artificial intelligence, raising in a more concrete and present way important questions and fears about what AI is capable of and how it will impact us as humans. One of the concerns most frequently voiced, whether sincerely or cloaked in jest, is how ChatGPT or systems like it, will impact our livelihoods. In other words, “will ChatGPT put me out of a job???” In this episode of the podcast, I seek to answer this very question by conducting an interview in which ChatGPT is asking all the questions. (The questions are answered by a second ChatGPT, as in my own recent Interview with it, Exploring Large Laguage Models with ChatGPT.) In addition to the straight dialogue, I include my own commentary along the way and conclude with a discussion of the results of the experiment, that is, whether I think ChatGPT will be taking my job as your host anytime soon. Ultimately, though, I hope you’ll be the judge of that and share your thoughts on how ChatGPT did at my job via a comment below or on social media.
Today we continue our re:Invent 2022 series joined by Kumar Chellapilla, a general manager of ML and AI Services at AWS. We had the opportunity to speak with Kumar after announcing their recent addition of geospatial data to the SageMaker Platform. In our conversation, we explore Kumar’s role as the GM for a diverse array of SageMaker services, what has changed in the geospatial data landscape over the last 10 years, and why Amazon decided now was the right time to invest in geospatial data. We discuss the challenges of accessing and working with this data and the pain points they’re trying to solve. Finally, Kumar walks us through a few customer use cases, describes how this addition will make users more effective than they currently are, and shares his thoughts on the future of this space over the next 2-5 years, including the potential intersection of geospatial data and stable diffusion/generative models. The complete show notes for this episode can be found at twimlai.com/go/607
Today we’re joined by Disha Singla, a senior director of machine learning engineering at Capital One. In our conversation with Disha, we explore her role as the leader of the Data Insights team at Capital One, where they’ve been tasked with creating reusable libraries, components, and workflows to make ML usable broadly across the company, as well as a platform to make it all accessible and to drive meaningful insights. We discuss the construction of her team, as well as the types of interactions and requests they receive from their customers (data scientists), productionized use cases from the platform, and their efforts to transition from batch to real-time deployment. Disha also shares her thoughts on the ROI of machine learning and getting buy-in from executives, how she sees machine learning evolving at the company over the next 10 years, and much more! The complete show notes for this episode can be found at twimlai.com/go/606
Today we’re excited to kick off our coverage of the 2022 NeurIPS conference with Johann Brehmer, a research scientist at Qualcomm AI Research in Amsterdam. We begin our conversation discussing some of the broader problems that causality will help us solve, before turning our focus to Johann’s paper Weakly supervised causal representation learning, which seeks to prove that high-level causal representations are identifiable in weakly supervised settings. We also discuss a few other papers that the team at Qualcomm presented, including neural topological ordering for computation graphs, as well as some of the demos they showcased, which we’ll link to on the show notes page. The complete show notes for this episode can be found at twimlai.com/go/605.
Today we’re excited to kick off our 2022 AWS re:Invent series with a conversation with Emad Mostaque, Founder and CEO of Stability.ai. Stability.ai is a very popular name in the generative AI space at the moment, having taken the internet by storm with the release of its stable diffusion model just a few months ago. In our conversation with Emad, we discuss the story behind Stability's inception, the model's speed and scale, and the connection between stable diffusion and programming. We explore some of the spaces that Emad anticipates being disrupted by this technology, his thoughts on the open-source vs API debate, how they’re dealing with issues of user safety and artist attribution, and of course, what infrastructure they’re using to stand the model up. The complete show notes for this episode can be found at https://twimlai.com/go/604.
Today we're joined by ChatGPT, the latest and coolest large language model developed by OpenAl. In our conversation with ChatGPT, we discuss the background and capabilities of large language models, the potential applications of these models, and some of the technical challenges and open questions in the field. We also explore the role of supervised learning in creating ChatGPT, and the use of PPO in training the model. Finally, we discuss the risks of misuse of large language models, and the best resources for learning more about these models and their applications. Join us for a fascinating conversation with ChatGPT, and learn more about the exciting world of large language models. The complete show notes for this episode can be found at https://twimlai.com/go/603
Are AI-generating algorithms the path to artificial general intelligence(AGI)? Today we’re joined by Jeff Clune, an associate professor of computer science at the University of British Columbia, and faculty member at the Vector Institute. In our conversation with Jeff, we discuss the broad ambitious goal of the AI field, artificial general intelligence, where we are on the path to achieving it, and his opinion on what we should be doing to get there, specifically, focusing on AI generating algorithms. With the goal of creating open-ended algorithms that can learn forever, Jeff shares his three pillars to an AI-GA, meta-learning architectures, meta-learning algorithms, and auto-generating learning environments. Finally, we discuss the inherent safety issues with these learning algorithms and Jeff’s thoughts on how to combat them, and what the not-so-distant future holds for this area of research. The complete show notes for this episode can be found at twimlai.com/go/602.
Today we’re joined by Cedric Cocaud, the chief engineer of the Wayfinder Group at Acubed, the innovation center for aircraft manufacturer Airbus. In our conversation with Cedric, we explore some of the technical challenges of innovation in the aircraft space, including autonomy. Cedric’s work on Project Vahana, Acubed’s foray into air taxis, attempted to leverage work in the self-driving car industry to develop fully autonomous planes. We discuss some of the algorithms being developed for this work, the data collection process, and Cedric’s thoughts on using synthetic data for these tasks. We also discuss the challenges of labeling the data, including programmatic and automated labeling, and much more.
Today we’re joined by Heather Nolis, a principal machine learning engineer at T-Mobile. In our conversation with Heather, we explored her machine learning journey at T-Mobile, including their initial proof of concept project, which held the goal of putting their first real-time deep learning model into production. We discuss the use case, which aimed to build a model customer intent model that would pull relevant information about a customer during conversations with customer support. This process has now become widely known as blank assist. We also discuss the decision to use supervised learning to solve this problem and the challenges they faced when developing a taxonomy. Finally, we explore the idea of using small models vs uber-large models, the hardware being used to stand up their infrastructure, and how Heather thinks about the age-old question of build vs buy.
Today we’re joined by return guest Ken Goldberg, a professor at UC Berkeley and the chief scientist at Ambi Robotics. It’s been a few years since our initial conversation with Ken, so we spent a bit of time talking through the progress that has been made in robotics in the time that has passed. We discuss Ken’s recent work, including the paper Autonomously Untangling Long Cables, which won Best Systems Paper at the RSS conference earlier this year, including the complexity of the problem and why it is classified as a systems challenge, as well as the advancements in hardware that made solving this problem possible. We also explore Ken’s thoughts on the push towards simulation by research entities and large tech companies, and the potential for causal modeling to find its way into robotics. Finally, we discuss the recent showcase of Optimus, Tesla, and Elon Musk’s “humanoid” robot and how far we are from it being a viable piece of technology. The complete show notes for this episode can be found at twimlai.com/go/599.
Today friend of the show and esteemed guest host John Bohannon is back with another great interview, this time around joined by Oren Etzioni, former CEO of the Allen Institute for AI, where he is currently an advisor. In our conversation with Oren, we discuss his philosophy as a researcher and how that has manifested in his pivot to institution builder. We also explore his thoughts on the current landscape of NLP, including the emergence of LLMs and the hype being built up around AI systems from folks like Elon Musk. Finally, we explore some of the research coming out of AI2, including Semantic Scholar, an AI-powered research tool analogous to arxiv, and the somewhat controversial Delphi project, a research prototype designed to model people’s moral judgments on a variety of everyday situations.
Over the last few years, it’s been established that your ML team needs at least some basic tooling in order to be effective, providing support for various aspects of the machine learning workflow, from data acquisition and management, to model development and optimization, to model deployment and monitoring. But how do you get there? Many tools available off the shelf, both commercial and open source, can help. At the extremes, these tools can fall into one of a couple of buckets. End-to-end platforms that try to provide support for many aspects of the ML lifecycle, and specialized tools that offer deep functionality in a particular domain or area. At TWIMLcon: AI Platforms 2022, our panelists debated the merits of these approaches in The Great MLOps Debate: End-to-End ML Platforms vs Specialized Tools.
Much of the way we talk and think about MLOps comes from the perspective of large consumer internet companies like Facebook or Google. If you work at a FAANG company, these approaches might work well for you. But what about if you work at one of the many small, B2B companies that stand to benefit through the use of machine learning? How should you be thinking about MLOps and the ML lifecycle in that case? In this live podcast interview from TWIMLcon: AI Platforms 2022, Sam Charrington explores these questions with Jacopo Tagliabue, whose perspectives and contributions on scaling down MLOps have served to make the field more accessible and relevant to a wider array of practitioners.
Today we’re joined by Ali Rodell, a senior director of machine learning engineering at Capital One. In our conversation with Ali, we explore his role as the head of model development platforms at Capital One, including how his 25+ years in software development have shaped his view on building platforms and the evolution of the platforms space over the last 10 years. We discuss the importance of a healthy open source tooling ecosystem, Capital One’s use of various open source capabilites like kubeflow and kubernetes to build out platforms, and some of the challenges that come along with modifying/customizing these tools to work for him and his teams. Finally, we explore the range of user personas that need to be accounted for when making decisions about tooling, supporting things like Jupyter notebooks and other low level tools, and how that can be potentially challenging in a highly regulated environment like the financial industry. The complete show notes for this episode can be found at twimlai.com/go/595
Today we’re joined by Vasi Philomin, vice president of AI services at AWS, joins us for our first in-person interview since 2019! In our conversation with Vasi, we discussed the recently released Amazon Code Whisperer, a developer-focused coding companion. We begin by exploring Vasi’s role and the various products under the banner of cognitive and non-cognitive services, and how those came together where Code Whisperer fits into the equation and some of the differences between Code Whisperer and some of the other recently released coding companions like GitHub Copilot. We also discuss the training corpus for the model, and how they’ve dealt with the potential issues of bias that arise when training LLMs with crawled web data, and Vasi’s thoughts on what the path of innovation looks like for Code Whisperer. At the end of our conversation, Vasi was gracious enough to share a quick live demo of Code Whisperer, so you can catch that here.
TWIMLcon: AI Platforms 2022 is just a day away! If you're interested in all things MLOps and Platforms/Infrastructure technology, this is the event for you! Register now at https://twimlcon.com/attend for FREE!
Today we’re joined by Vidyut Naware, the director of machine learning and artificial intelligence at Paypal. As the leader of the ML/AI organization at Paypal, Vidyut is responsible for all things applied, from R&D to MLOps infrastructure. In our conversation, we explore the work being done in four major categories, hardware/compute, data, applied responsible AI, and tools, frameworks, and platforms. We also discuss their use of federated learning and delayed supervision models for use cases like anomaly detection and fraud prevention, research into quantum computing and causal inference, as well as applied use cases like graph machine learning and collusion detection. The complete show notes for this episode can be found at twimlai.com/go/593
Today we’re back with another installment of our Data-Centric AI series, joined by Wendy Foster, a director of engineering & data science at Shopify. In our conversation with Wendy, we explore the differences between data-centric and model-centric approaches and how they manifest at Shopify, including on her team, which is responsible for utilizing merchant and product data to assist individual vendors on the platform. We discuss how they address, maintain, and improve data quality, emphasizing the importance of coverage and “freshness” data when solving constantly evolving use cases. Finally, we discuss how data is taxonomized at the company and the challenges that present themselves when producing large-scale ML models, future use cases that Wendy expects her team to tackle, and we briefly explore Merlin, Shopify’s new ML platform (that you can hear more about at TWIMLcon!), and how it fits into the broader scope of ML at the company. The complete show notes for this episode can be found at twimlai.com/go/592
Today we’re joined by Bayan Bruss, a Sr. director of applied ML research at Capital One. In our conversation with Bayan, we dig into his work in applying various deep learning techniques to tabular data, including taking advancements made in other areas like graph CNNs and other traditional graph mining algorithms and applying them to financial services applications. We discuss why despite a “flood” of innovation in the field, work on tabular data doesn’t elicit as much fanfare despite its broad use across businesses, Bayan’s experience with the difficulty of making deep learning work on tabular data, and what opportunities have been presented for the field with the emergence of multi-modality and transformer models. We also explore a pair of papers from Bayan’s team, focused on both transformers and transfer learning for tabular data. The complete show notes for this episode can be found at twimlai.com/go/591
Today we’re joined by Orit Peleg, an assistant professor at the University of Colorado, Boulder. Orit’s work focuses on understanding the behavior of disordered living systems, by merging tools from physics, biology, engineering, and computer science. In our conversation, we discuss how Orit found herself exploring problems of swarming behaviors and their relationship to distributed computing system architecture and spiking neurons. We look at two specific areas of research, the first focused on the patterns observed in firefly species, how the data is collected, and the types of algorithms used for optimization. Finally, we look at how Orit’s research with fireflies translates to a completely different insect, the honeybee, and what the next steps are for investigating these and other insect families. The complete show notes for this episode can be found at twimlai.com/go/590
In this extra special episode of the TWIML AI Podcast, a friend of the show John Bohannon leads a jam-packed conversation with Hugging Face’s recently appointed head of research Douwe Kiela. In our conversation with Douwe, we explore his role at the company, how his perception of Hugging Face has changed since joining, and what research entails at the company. We discuss the emergence of the transformer model and the emergence of BERT-ology, the recent shift to solving more multimodal problems, the importance of this subfield as one of the “Grand Directions'' of Hugging Face’s research agenda, and the importance of BLOOM, the open-access Multilingual Language Model that was the output of the BigScience project. Finally, we get into how Douwe’s background in philosophy shapes his view of current projects, as well as his projections for the future of NLP and multimodal ML. The complete show notes for this episode can be found at twimlai.com/go/589
Today we’re joined by Bill Vass, a VP of engineering at Amazon Web Services. Bill spoke at the most recent AWS re:MARS conference, where he delivered an engineering Keynote focused on some recent updates to Amazon sagemaker, including its support for synthetic data generation. In our conversation, we discussed all things synthetic data, including the importance of data quality when creating synthetic data, and some of the use cases that this data is being created for, including warehouses and in the case of one of their more recent acquisitions, iRobot, synthetic house generation. We also explore Astro, the household robot for home monitoring, including the types of models running it, is running, what type of on-device sensor suite it has, the relationship between the robot and the cloud, and the role of simulation. The complete show notes for this episode can be found at twimlai.com/go/588
Today we’re joined by Jeff Gehlhaar, vice president of technology at Qualcomm Technologies. In our annual conversation with Jeff, we dig into the relationship between Jeff’s team on the product side and the research team, many of whom we’ve had on the podcast over the last few years. We discuss the challenges of real-world neural network deployment and doing quantization on-device, as well as a look at the tools that power their AI Stack. We also explore a few interesting automotive use cases, including automated driver assistance, and what advancements Jeff is looking forward to seeing in the next year. The complete show notes for this episode can be found at twimlai.com/go/587
Today we close out our ICML 2022 coverage joined by Sharad Goel, a professor of public policy at Harvard University. In our conversation with Sharad, we discuss his Outstanding Paper award winner Causal Conceptions of Fairness and their Consequences, which seeks to understand what it means to apply causality to the idea of fairness in ML. We explore the two broad classes of intent that have been conceptualized under the subfield of causal fairness and how they differ, the distinct ways causality is treated in economic and statistical contexts vs a computer science and algorithmic context, and why policies are created in the context of causal definitions are suboptimal broadly. The complete show notes for this episode can be found at twimlai.com/go/586
Today we continue our ICML coverage joined by Melika Payvand, a research scientist at the Institute of Neuroinformatics at the University of Zurich and ETH Zurich. Melika spoke at the Hardware Aware Efficient Training (HAET) Workshop, delivering a keynote on Brain-inspired hardware and algorithm co-design for low power online training on the edge. In our conversation with Melika, we explore her work at the intersection of ML and neuroinformatics, what makes the proposed architecture “brain-inspired”, and how techniques like online learning fit into the picture. We also discuss the characteristics of the devices that are running the algorithms she’s creating, and the challenges of adapting online learning-style algorithms to this hardware. The complete show notes for this episode can be found at twimlai.com/go/585
Today we’re joined by Arash Behboodi, a machine learning researcher at Qualcomm Technologies. In our conversation with Arash, we explore his paper Equivariant Priors for Compressed Sensing with Unknown Orientation, which proposes using equivariant generative models as a prior means to show that signals with unknown orientations can be recovered with iterative gradient descent on the latent space of these models and provide additional theoretical recovery guarantees. We discuss the differences between compression and compressed sensing, how he was able to evolve a traditional VAE architecture to understand equivalence, and some of the research areas he’s applying this work, including cryo-electron microscopy. We also discuss a few of the other papers that his colleagues have submitted to the conference, including Overcoming Oscillations in Quantization-Aware Training, Variational On-the-Fly Personalization, and CITRIS: Causal Identifiability from Temporal Intervened Sequences. The complete show notes for this episode can be found at twimlai.com/go/584
Today we continue our Data-Centric AI Series joined by Audrey Smith, the COO at MLtwist, and a recent participant in our panel on DCAI. In our conversation, we do a deep dive into data labeling for ML, exploring the typical journey for an organization to get started with labeling, her experience when making decisions around in-house vs outsourced labeling, and what commitments need to be made to achieve high-quality labels. We discuss how organizations that have made significant investments in labelops typically function, how someone working on an in-house labeling team approaches new projects, the ethical considerations that need to be taken for remote labeling workforces, and much more! The complete show notes for this episode can be found at twimlai.com/go/583
Today we’re joined by Richard Socher, the CEO of You.com. In our conversation with Richard, we explore the inspiration and motivation behind the You.com search engine, and how it differs from the traditional google search engine experience. We discuss some of the various ways that machine learning is used across the platform including how they surface relevant search results and some of the recent additions like code completion and a text generator that can write complete essays and blog posts. Finally, we talk through some of the projects we covered in our last conversation with Richard, namely his work on Salesforce’s AI Economist project. The complete show notes for this episode can be found at twimlai.com/go/582
Today we wrap up our coverage of the 2022 CVPR conference joined by Aljosa Osep, a postdoc at the Technical University of Munich & Carnegie Mellon University. In our conversation with Aljosa, we explore his broader research interests in achieving robot vision, and his vision for what it will look like when that goal is achieved. The first paper we dig into is Text2Pos: Text-to-Point-Cloud Cross-Modal Localization, which proposes a cross-modal localization module that learns to align textual descriptions with localization cues in a coarse-to-fine manner. Next up, we explore the paper Forecasting from LiDAR via Future Object Detection, which proposes an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Finally, we discuss Aljosa’s third and final paper Opening up Open-World Tracking, which proposes a new benchmark to analyze existing efforts in multi-object tracking and constructs a baseline for these tasks. The complete show notes for this episode can be found at twimlai.com/go/581
Today we continue our CVPR series joined by Kate Saenko, an associate professor at Boston University and a consulting professor for the MIT-IBM Watson AI Lab. In our conversation with Kate, we explore her research in multimodal learning, which she spoke about at the Multimodal Learning and Applications Workshop, one of a whopping 6 workshops she spoke at. We discuss the emergence of multimodal learning, the current research frontier, and Kate’s thoughts on the inherent bias in LLMs and how to deal with it. We also talk through some of the challenges that come up when building out applications, including the cost of labeling, and some of the methods she’s had success with. Finally, we discuss Kate’s perspective on the monopolizing of computing resources for “foundational” models, and her paper Unsupervised Domain Generalization by learning a Bridge Across Domains. The complete show notes for this episode can be found at twimlai.com/go/580
Today we kick off our annual coverage of the CVPR conference joined by Fatih Porikli, Senior Director of Engineering at Qualcomm AI Research. In our conversation with Fatih, we explore a trio of CVPR-accepted papers, as well as a pair of upcoming workshops at the event. The first paper, Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation, presents a novel framework to integrate semantic and instance contexts for panoptic segmentation. Next up, we discuss Imposing Consistency for Optical Flow Estimation, a paper that introduces novel and effective consistency strategies for optical flow estimation. The final paper we discuss is IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, which proposes a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness, and lighting from a single image of an indoor scene. For each paper, we explore the motivations and challenges and get concrete examples to demonstrate each problem and solution presented. The complete show notes for this episode can be found at twimlai.com/go/579
Today we’re joined by Adam Wood, Director of Data Governance and Data Quality at Mastercard. In our conversation with Adam, we explore the challenges that come along with data governance at a global scale, including dealing with regional regulations like GDPR and federating records at scale. We discuss the role of feature stores in keeping track of data lineage and how Adam and his team have dealt with the challenges of metadata management, how large organizations like Mastercard are dealing with enabling feature reuse, and the steps they take to alleviate bias, especially in scenarios like acquisitions. Finally, we explore data quality for data science and why Adam sees it as an encouraging area of growth within the company, as well as the investments they’ve made in tooling around data management, catalog, feature management, and more. The complete show notes for this episode can be found at twimlai.com/go/578
In the latest installment of our Data-Centric AI series, we’re joined by a friend of the show Mike Del Balso, Co-founder and CEO of Tecton. If you’ve heard any of our other conversations with Mike, you know we spend a lot of time discussing feature stores, or as he now refers to them, feature platforms. We explore the current complexity of data infrastructure broadly and how that has changed over the last five years, as well as the maturation of streaming data platforms. We discuss the wide vs deep paradox that exists around ML tooling, and the idea around the “ML Flywheel”, a strategy that leverages data to accelerate machine learning. Finally, we spend time discussing internal ML team construction, some of the challenges that organizations face when building their ML platforms teams, and how they can avoid the pitfalls as they arise. The complete show notes for this episode can be found at twimlai.com/go/577
Today we continue our Data-centric AI series joined by Shayan Mohanty, CEO at Watchful. In our conversation with Shayan, we focus on the data labeling aspect of the machine learning process, and ways that a data-centric approach could add value and reduce cost by multiple orders of magnitude. Shayan helps us define “data-centric”, while discussing the main challenges that organizations face when dealing with labeling, how these problems are currently being solved, and how techniques like active learning and weak supervision could be used to more effectively label. We also explore the idea of machine teaching, which focuses on using techniques that make the model training process more efficient, and what organizations need to be successful when trying to make the aforementioned mindset shift to DCAI. The complete show notes for this episode can be found at twimlai.com/go/576
This week, we continue our conversations around the topic of Data-Centric AI joined by a friend of the show Adrien Gaidon, the head of ML research at the Toyota Research Institute (TRI). In our chat, Adrien expresses a fourth, somewhat contrarian, viewpoint to the three prominent schools of thought that organizations tend to fall into, as well as a great story about how the breakthrough came via an unlikely source. We explore his principle-centric approach to machine learning as well as the role of self-supervised machine learning and synthetic data in this and other research threads. Make sure you’re following along with the entire DCAI series at twimlai.com/go/dcai. The complete show notes for this episode can be found at twimlai.com/go/575
Today we kick things off with a conversation with D. Sculley, a director on the Google Brain team. Many listeners of today’s show will know D. from his work on the paper, The Hidden Technical Debt in Machine Learning Systems, and of course, the infamous diagram. D. has recently translated the idea of technical debt into data debt, something we spend a bit of time on in the interview. We discuss his view of the concept of DCAI, where debt fits into the conversation of data quality, and what a shift towards data-centrism looks like in a world of increasingly larger models i.e. GPT-3 and the recent PALM models. We also explore common sources of data debt, what are things that the community can and have done to mitigate these issues, the usefulness of causal inference graphs in this work, and much more! If you enjoyed this interview or want to hear more on this topic, check back on the DCAI series page weekly at https://twimlai.com/podcast/twimlai/series/data-centric-ai. The complete show notes for this episode can be found at twimlai.com/go/574
Today we’re joined by Rob Walker, VP of decisioning & analytics and gm of one-to-one customer engagement at Pegasystems. Rob, who you might know from his previous appearances on the podcast, joins us to discuss his work on AI and ML in the context of customer engagement and decisioning, the various problems that need to be solved, including solving the “next best” problem. We explore the distinction between the idea of the next best action and determining it from a recommender system, how the combination of machine learning and heuristics are currently co-existing in engagements, scaling model evaluation, and some of the challenges they’re facing when dealing with problems of responsible AI and how they’re managed. Finally, we spend a few minutes digging into the upcoming PegaWorld conference, and what attendees should anticipate at the event. The complete show notes for this episode can be found at twimlai.com/go/573
Today we close out our coverage of the ICLR series joined by Meg Mitchell, chief ethics scientist and researcher at Hugging Face. In our conversation with Meg, we discuss her participation in the WikiM3L Workshop, as well as her transition into her new role at Hugging Face, which has afforded her the ability to prioritize coding in her work around AI ethics. We explore her thoughts on the work happening in the fields of data curation and data governance, her interest in the inclusive sharing of datasets and creation of models that don't disproportionately underperform or exploit subpopulations, and how data collection practices have changed over the years. We also touch on changes to data protection laws happening in some pretty uncertain places, the evolution of her work on Model Cards, and how she’s using this and recent Data Cards work to lower the barrier to entry to responsibly informed development of data and sharing of data. The complete show notes for this episode can be found at twimlai.com/go/572
Today we continue our ICLR coverage joined by Been Kim, a staff research scientist at Google Brain, and an ICLR 2022 Invited Speaker. Been, whose research has historically been focused on interpretability in machine learning, delivered the keynote Beyond interpretability: developing a language to shape our relationships with AI, which explores the need to study AI machines as scientific objects, in isolation and with humans, which will provide principles for tools, but also is necessary to take our working relationship with AI to the next level. Before we dig into Been’s talk, she characterizes where we are as an industry and community with interpretability, and what the current state of the art is for interpretability techniques. We explore how the Gestalt principles appear in neural networks, Been’s choice to characterize communication with machines as a language as opposed to a set of principles or foundational understanding, and much much more. The complete show notes for this episode can be found at twimlai.com/go/571
Today we’re joined by Auke Wiggers, an AI research scientist at Qualcomm. In our conversation with Auke, we discuss his team’s recent research on data compression using generative models. We discuss the relationship between historical compression research and the current trend of neural compression, and the benefit of neural codecs, which learn to compress data from examples. We also explore the performance evaluation process and the recent developments that show that these models can operate in real-time on a mobile device. Finally, we discuss another ICLR paper, “Transformer-based transform coding”, that proposes a vision transformer-based architecture for image and video coding, and some of his team’s other accepted works at the conference. The complete show notes for this episode can be found at twimlai.com/go/570
Today we’re joined by Irwan Bello, formerly a research scientist at Google Brain, and now on the founding team at a stealth AI startup. We begin our conversation with an exploration of Irwan’s recent paper, Designing Effective Sparse Expert Models, which acts as a design guide for building sparse large language model architectures. We discuss mixture of experts as a technique, the scalability of this method, and it's applicability beyond NLP tasks the data sets this experiment was benchmarked against. We also explore Irwan’s interest in the research areas of alignment and retrieval, talking through interesting lines of work for each area including instruction tuning and direct alignment. The complete show notes for this episode can be found at twimlai.com/go/569
Today we’re joined by friend of the show Timnit Gebru, the founder and executive director of DAIR, the Distributed Artificial Intelligence Research Institute. In our conversation with Timnit, we discuss her journey to create DAIR, their goals and some of the challenges shes faced along the way. We start is the obvious place, Timnit being “resignated” from Google after writing and publishing a paper detailing the dangers of large language models, the fallout from that paper and her firing, and the eventual founding of DAIR. We discuss the importance of the “distributed” nature of the institute, how they’re going about figuring out what is in scope and out of scope for the institute’s research charter, and what building an institution means to her. We also explore the importance of independent alternatives to traditional research structures, if we should be pessimistic about the impact of internal ethics and responsible AI teams in industry due to the overwhelming power they wield, examples she looks to of what not to do when building out the institute, and much much more! The complete show notes for this episode can be found at twimlai.com/go/568
Today we’re joined by Doina Precup, a research team lead at DeepMind Montreal, and a professor at McGill University. In our conversation with Doina, we discuss her recent research interests, including her work in hierarchical reinforcement learning, with the goal being agents learning abstract representations, especially over time. We also explore her work on reward specification for RL agents, where she hypothesizes that a reward signal in a complex environment could lead an agent to develop attributes of intuitive intelligence. We also dig into quite a few of her papers, including On the Expressivity of Markov Reward, which won a NeruIPS 2021 outstanding paper award. Finally, we discuss the analogy between hierarchical RL and CNNs, her work in continual RL, and her thoughts on the evolution of RL in the recent past and present, and the biggest challenges facing the field going forward. The complete show notes for this episode can be found at twimlai.com/go/567