Loading...
Loading...
0 / 10 episodes
No episodes yet
Tap + Later on any episode to add it here.
Google's Cloud Next 2026 keynote? Fire. 🔥 The TPU is now two chips instead of one — 8t for training, 8i for inference — but more interestingly, it's two scale-up networking topologies too. Austin Lyons (Chipstrat) and Vik Sekar (Vik's Newsletter) walk through what actually changed, one day after the announcement. OCS? Yes. AECs? Yep. Copper? Yep. Optics? Yep. We cover Virgo (Google's 47 petabit/second scale-out fabric, built entirely on OCS), Boardfly (the new scale-up topology for MoE inference that cuts hop count from 16 to 7), and the 3D torus Google still uses for training. Why is optical circuit switching the substrate of Google's data center? Why do active electrical cables still carry scale-up traffic inside racks? Why did Google split the CPU layer too, with custom ARM Axion head nodes to keep the TPUs fed? Along the way we trace the Dragonfly topology lineage to a 2008 paper by John Kim, Bill Dally, Steve Scott, and Dennis Abts. Abts went on to build Groq's rack-scale interconnect before landing at Nvidia. Chapters: 0:00 Intro 0:21 Two TPUs for two workloads 2:31 HBM, SRAM, and Axion CPUs 7:22 Why networking is the new bottleneck 17:14 Virgo: rebuilding scale-out on optics 25:24 3D torus Rubik's Cube scale-up for training 34:50 Boardfly: scale-up for MoE inference 42:07 Workload-specific everything Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/austinsemis Follow Vik: Newsletter: https://www.viksnewsletter.com/ X: https://x.com/vikramskr
Matt Steiner, VP of Monetization Infrastructure, Ranking & AI Foundations at Meta, walks through how Meta's ad system actually works, and why the infrastructure behind it differs from what you'd build for LLMs. We cover Andromeda (retrieval on a custom NVIDIA Grace Hopper SKU Meta co-designed), Lattice (consolidating N ranking models into one), GEM (Meta's Generative Ads Recommendation foundation model), and the adaptive ranking model, a roughly one-trillion-parameter recommender served at sub-second latency. We get into why recommender workloads aren't embarrassingly parallel like LLMs (the "personalization blob"), what that means for Meta's MTIA custom silicon roadmap, and how LLM-written kernels (KernelEvolve) flipped the economics of running a heterogeneous hardware fleet. Demand for software engineering has actually gone up as the price has come down. Meta now wants ~100x more optimized kernels per chip. Read the full transcript at https://www.chipstrat.com/p/an-interview-with-meta-vp-matt-steiner Chapters: 0:00 Intro and scale 0:39 How Meta's ad system works 2:00 Meta Andromeda and the custom NVIDIA SKU 3:30 Lattice: consolidating ranking models 5:00 GEM, Meta's ads foundation model 6:30 Adaptive ranking for power users 8:17 The scale: 3B DAUs at sub-second latency 9:40 Why longer interaction histories matter 10:45 The anniversary gift analogy 12:57 A decade of compute evolution 15:21 Meta's infra as a CP-SAT problem 16:07 Co-designing Grace Hopper with NVIDIA 17:47 Matching compute shape to workload 18:26 Influencing hardware and software roadmaps 20:23 MTIA: why ads aren't LLMs 22:07 The personalization blob and I/O ratios 26:38 One trillion parameters at sub-second latency 28:26 Heterogeneous hardware trade-offs 29:30 KernelEvolve: LLMs writing custom kernels 33:30 GenAI and recommender systems cross-pollination 35:21 The 2-year infrastructure outlook 37:00 Why demand for software engineering is rising 38:53 How Matt stays on top of it all Relevant reading: KernelEvolve (Meta Engineering): https://engineering.fb.com/2026/04/02/developer-tools/kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure/ Follow Chipstrat: Newsletter: https://www.chipstrat.com X: https://x.com/chipstrat
Austin and Vik discuss Credo's acquisition of Dust Photonics, XPO as the new standard for scale-out (maybe instead of CPO?) and some thoughts about Nuvacore entering the CPU scene for agentic AI. Gavin Baker's tweet: https://x.com/GavinSBaker/status/2044410644301046031?s=20 Vik's Substack: https://www.viksnewsletter.com Austin's Substack: https://www.chipstrat.com Chapters 00:00 Introduction to the Semiconductor Landscape 02:49 The Rise of Nuvacore and CPU Innovations 05:27 The Demand for CPUs in the AI Era 07:59 Photonics: The Next Frontier in Semiconductors 10:26 Credo's Acquisition of Dust Photonics 13:12 Vertical Integration in Semiconductor Companies 15:15 The Future of Copper and Optical Technologies 20:28 The Evolution of AI Training Models 25:28 Innovations in Optical Interconnects 31:10 The Future of Data Center Connectivity 36:56 Strategic Implications in the Optical Ecosystem
In this episode, Austin and Vik discuss if Intel is finally back with CPU partnerships with Google, and heterogeneous inference with SambaNova, while market cap soars above $300B. Vik tries to get his OpenClaw instance to dream every night. Chapters 00:00 Anthropic's New Direction: Chip Development 02:30 Navigating Subscription Changes and Token Costs 05:25 Exploring Alternative AI Models 08:10 The Economics of AI: Rent vs. Buy 10:56 Intel's Resurgence and Market Dynamics 15:23 Intel's Strategic Partnerships and Market Positioning 19:37 The Role of IPUs in Modern Computing 25:08 Coexistence of x86 and ARM Architectures 29:55 Innovations in Chip Architecture and Future Prospects
Reiner Pope is the co-founder and CEO of MatX, the startup building chips designed from first principles for LLMs. Before MatX, Reiner was on the Google Brain team training LLMs, and his co-founder Mike Gunter was on the TPU team. They left Google one week before ChatGPT was released. A counterintuitive throughput insight from the conversation: “Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.” We get into: • The hybrid SRAM + HBM bet, and why pipeline parallelism finally works • Overcoming the CUDA moat • Why frontier labs are willing to bet on an AI ASIC startup • Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not) • Why 95% of model-side news is noise for chip design • Why sparse MoE drives MatX to “the most interconnect of any announced product” • How MatX uses AI for its own chip design • The biggest challenges ahead Chapters: 00:00 “We left Google one week before ChatGPT” 00:24 Intro: who is MatX 01:17 Origin story: leaving Google for LLM chips 02:21 GPT-3 and the “too expensive” problem 04:25 Why buy hardware that is not a GPU 05:52 Overcoming the CUDA moat 08:46 Early investors 09:35 The name MatX 09:59 The chip: matrix multiply + hybrid SRAM/HBM 12:11 Why pipeline parallelism finally works 14:22 Reading papers and Google going dark 15:20 Research agenda: attention and numerics 17:06 Five specs and meeting customers where they are 19:24 Why frontier labs are the natural first customer 20:32 Workloads: training, prefill, decode 22:18 Little’s law and the throughput case for low latency 24:29 Interconnect and MoE topology 26:35 Inside the team: 100 people, full stack 28:32 Agentic AI: 95% noise for hardware 30:35 KV cache sizing in an agentic world 32:11 How MatX uses AI for chip design (Verilog + BlueSpec) 34:23 Go to market: proving credibility under NDA 35:12 Porting effort for frontier labs 36:34 Biggest skepticism: manufacturing at gigawatt scale 37:32 Hiring plug Austin Lyons @ Chipstrat: https://www.chipstrat.com Vik Sekar @ Vik's Newsletter: https://www.viksnewsletter.com/
Intel Foundry just partnered with Elon Musk’s Terafab. What is Terafab anyway, why vertically integrated fabs make sense but the economics don’t (yet!), and what Intel is doing here (hint: no idea). Then: OpenAI acquires TBPN for an estimated $100-300M. Not sure why, but the more interesting thing is the value of niche audiences when five companies control a trillion dollars in AI capex. And finally, Citrini Research sent an analyst to the Strait of Hormuz with a Pelican case full of spy gear, $15K cash, and Cuban cigars. The most unhinged research trip in Substack history. Austin Lyons — Chipstrat (https://chipstrat.com) Vik Sekar — Vik's Newsletter (https://www.viksnewsletter.com) Subscribe for weekly episodes on semiconductors, AI, infrastructure, and the business of chips.
In this episode, Austin and Vik analyze NVIDIA's $2 billion investment in Marvell NVLink Fusion, exploring its implications for AI infrastructure, interconnect protocols, and the broader chip ecosystem. They also discuss the current memory market surge, DRAM pricing, and Intel's strategic fab buyback, providing deep insights into industry trends and future directions. On Substack Vik: https://www.viksnewsletter.com/ Austin: https://www.chipstrat.com/ Chapters 00:00 NVIDIA's $2 Billion Investment in Marvell 20:11 The Memory Market Crisis 20:16 The Future of Memory Pricing and Consumer Impact 22:55 The Cycle of Supply and Demand in Memory 27:23 AI's Impact on Memory Demand 31:46 Long-Term Agreements and Market Stability 35:07 Intel's Strategic Fab Buyback 40:44 Monopoly Analogy: Intel's Market Strategy
In this episode, Austin and Vik analyze recent developments in GloFo patent lawsuits, the impact of TurboQuant on AI inference, and ARM's strategic move into silicon for agentic AI workloads. Read Vik's substack: https://www.viksnewsletter.com Read Austin's substack: https://www.chipstrat.com Chapters 00:00 Patent Wars in Semiconductor Industry 07:14 Understanding TurboQuant and Its Implications 24:42 Innovations in Memory Management 28:00 The Rise of ARM AGI CPUs 32:56 Agentic AI and CPU Compatibility 39:54 Performance Metrics in Agentic AI 44:52 ARM's Market Timing and Challenges
Austin and Vik break down a packed week in semiconductors, covering GTC, OFC, and Micron earnings. The conversation kicks off with Jensen Huang's bold claim that engineers should spend $250K/year on AI tokens, and whether companies will buy tokens or token generators (i.e., on-prem hardware like the Dell Pro Max with GB300). They dig into the CapEx vs OpEx tradeoffs, data security concerns, and how sharing GPU resources might end up looking a lot like the old EDA license model. Next up: Micron crushed earnings and appears to be designed into Vera Rubin for HBM4 — despite months of rumors saying otherwise. Austin and Vik unpack the nuance around HBM pin speeds, memory node base dies, and what Micron's massive new fab investments in Taiwan, Singapore, Idaho, and New York mean for the memory cycle. The back half of the episode dives into optical interconnects for AI scale-up. A new industry consortium (OCI-MSA) has formed with Meta, Broadcom, NVIDIA, and OpenAI to standardize optical components. Vik explains why traditional indium phosphide lasers might be overkill for short-reach scale-up, and makes the case for micro LEDs — a "slow but wide" approach that could fill the gap between copper and conventional optics. They also touch on Credo's expanding product portfolio (and the infamous purple-to-orange cable saga), plus Lumentum's new VCSEL work for scale-up. Vik - https://www.viksnewsletter.com/ Austin - https://www.chipstrat.com/ CHAPTERS 0:00 Intro & GTC/OFC Conference Overload 2:09 Jensen's $250K Token Budget Per Engineer 5:08 On-Prem Inference vs. Cloud Token Spending (Dell Pro Max, CapEx vs OpEx) 6:44 Sharing GPU Resources Like EDA Licenses 8:16 Data Security & On-Prem Privacy Concerns 9:53 Matthew Berman's Fine-Tuned Open Claw Agent 10:35 Vik Sets Up Open Claw on a Home Server 11:53 Always Be Clauden (ABC) – Managing Agents from Your Phone 13:34 Micron Earnings & HBM4 in Vera Rubin 16:39 HBM Pin Speeds & the Micron Design-In Debate 20:17 Micron's New Fab Investments & Memory Cycle Fears 23:49 Why AI Drives a Step Change in Memory Demand 26:30 Optical Compute Interconnect MSA (OCI-MSA) 29:48 Scale-Up Optics: Do We Need New Technology? 30:58 Micro LEDs – The "Slow but Wide" Approach 35:45 Micro LEDs vs. Copper vs. Traditional Optics 36:55 Credo's Product Spectrum & the Purple Cable Story 39:31 VCSELs & Lumentum's 1060nm Scale-Up Play
Vik and Austin unpack the Nvidia GTC keynote with fresh, top-of-mind takes while trying to breakdown key announcements, what matters and what doesn't. They discuss Groq's LPX, optics+copper for scale up, new CPU requirements, CPO for networking, and what agents means for software, and much, much, more. Check out Austin's substack: https://www.chipstrat.com Check out Vik's substack: https://www.viksnewsletter.com Chapters 00:00 Introduction and Keynote Context 03:18 Keynote Highlights and Gaming Innovations 06:18 Generative AI: The Three Eras 09:28 Inference: The New Revenue Generator 12:21 NVIDIA's Tiered Approach to AI Models 15:30 The Grok Chip and Its Role 18:35 Vera Rubin System: A Full Data Center 21:18 CPU Demand and Performance 24:31 Networking Innovations and Future Directions 32:32 Innovations in PCB Technology 34:06 Scaling GPU Systems 36:57 Understanding the STX Rack and AI Storage 38:23 The Rosa CPU and Its Significance 40:07 Digital Twin Platforms and AI Factories 43:53 NVIDIA's New Software Innovations 47:09 The Future of Token Budgets in AI 54:15 Balancing CapEx and OpEx in AI Deployments
Austin recaps moderating an agentic AI panel at Synopsys Converge, then gives an in-depth technical breakdown of Meta's MTIA custom silicon. Why they're building it, how chiplets let them ship a new chip every 6 months, and how the roadmap is shifting toward gen AI inference. Vik digs into Applied Optoelectronics (AAOI), the vertically integrated Texas laser shop whose stock went from $1.48 to $100+, and whether history is about to rhyme. Austin Lyons: https://www.chipstrat.com Vik Sekar: https://www.viksnewsletter.com/ Topics covered: • Agentic AI in chip design — how it changes roles for junior and senior engineers • Optical circuit switching and what it means for Arista's business model • Meta's ad-serving pipeline: Andromeda, Lattice, and the GEM foundation model • Why custom silicon (MTIA) makes sense at Meta's scale • MTIA chiplet strategy — 4 generations in 2 years • AAOI's vertical integration, Amazon's $4B warrant deal, and the 2017 parallel Chapters: 0:00 Intro 1:26 Synopsys Converge — Agentic AI Panel 9:44 Vik's Article: Optical Circuit Switching & Arista 14:43 Meta MTIA — A New Chip Every 6 Months 21:32 Why Custom Silicon Makes Sense for Meta 27:22 MTIA Chiplet Strategy & Roadmap 33:56 Gen AI Fits Meta's Business Model 36:31 How Meta Ships Chips So Fast 40:30 Applied Optoelectronics (AAOI) Deep Dive 45:02 Amazon's $4B Warrant Deal 48:54 Can AAOI's Lasers Compete with Lumentum? 53:16 AAOI's Aggressive Capacity Buildout 55:35 History Rhymes: AAOI's 2017 Boom & Bust 1:00:55 Wrap-Up #semiconductors #chips #tech #meta #MTIA #AAOI #optics #inference #AI
This week, Austin and Vik break down the optics vs. copper debate that rocked semis this week. Nvidia dropped $4 billion on Lumentum and Coherent, Credo posted a blowout quarter betting on copper, and then Hock Tan shocked everyone claiming 400G per lane works over copper in Broadcom’s labs — potentially pushing CPO out to 2030+. Plus, Vik’s 4D chess conspiracy theory on why Hock Tan is talking up copper when Broadcom is a CPO company. Like, subscribe, and drop your thoughts on the copper vs. optics debate in the comments! Subscribe to our newsletters: * Chipstrat by Austin Lyons — chipstrat.com * Vik’s Semiconductor Newsletter by Vik Sekar — viksnewsletter.com Chapters (00:00) - Newsletter Plugs: Groq LPUs & Broadcom’s Laser Business (03:15) - Dynamo & the Rise of Workload-Specific Hardware (08:04) - Austin’s Broadcom Laser Deep Dive (09:53) - The Week’s Whiplash: Optics Monday, Copper Wednesday (17:50) - Why Nvidia Invested $4B: Geopolitics, Supply & the HBM Playbook (24:15) - CPO Lasers & Optical Circuit Switches (26:16) - Credo Earnings: 200% YoY Growth & the Copper Bull Case (31:09) - Reliability, AECs & Oracle’s GPU Cluster Problem (35:48) - Credo’s Optics Play: Micro-LED Active Cables & the CPO Timing Risk (38:45) - Broadcom Earnings: Hock Tan’s Copper Bombshell (43:34) - Customer-Owned Tooling: Hock Tan Says “Good Luck” (44:25) - Vik’s 4D Chess Theory: Why Hock Tan Talks Up Copper (47:03) - Wrap-Up: It’s Both — The Real Question Is Timing
This week, we move from optics technology to optics companies. We walk the AI optical supply chain from bottom to top. Main debate: Who has a moat? Who is already priced for perfection? *Not investment advice, do your own due diligence* AXTI - Indium phosphide substrate supplier. Critical bottleneck in the laser stack. Major China export-control risk. Massive stock run vs thin earnings. Tower Semiconductor - Leading silicon photonics foundry. 5x capacity expansion with customer prepayments. Strong process lock-in. Pure-play optics exposure. GlobalFoundries - 300mm monolithic photonics platform + Chips Act support. Optics growing fast but still small piece of overall business. Lumentum - Dominant EML laser supplier. Explosive AI demand. Strong technical moat. Valuation and capex sensitivity are key risks. Coherent - Vertically integrated from substrate to module. 6-inch InP push could lower costs structurally. Execution and margin mix matter. Fabrinet - Optics assembly partner. High NVIDIA exposure. Scales with industry, but dependent on upstream supply. Corning - AI data centers require far more fiber than traditional cloud. $6B Meta deal adds visibility. Timing of scale-up optics is the swing factor. Timestamps 00:01 Intro 06:59 AXT $AXTI 13:38 Tower Semiconductor $TSEM 23:58 GlobalFoundries $GFS 32:43 Lumentum $LITE 39:38 Coherent $COHR 47:09 Fabrinet $FN 54:07 Corning $GLW Austin's Substack: https://www.chipstrat.com/ Vik's Substack: https://www.viksnewsletter.com/
Austin and Vik delve into the evolving landscape of optics and networking, particularly in relation to AI and data centers. The conversation covers various scales of networking, including scale across, scale out, and scale up, while also addressing the demand-supply dynamics in laser manufacturing and the future of optical circuit switches. The episode highlights the technological advancements and market opportunities in the optics sector, emphasizing the significance of these developments for the future of AI. Takeaways Silicon photonics is becoming crucial for data center connectivity.Optics is essential for overcoming copper's limitations in speed and distance.Scale across technology is vital for connecting data centers.Scale out optics is the standard for connecting GPUs between racks.Co-packaged optics can reduce energy consumption in data centers.The scale up market for optics is emerging as a new opportunity.Indium phosphide wafers are a critical bottleneck in laser manufacturing.Optical circuit switches are gaining traction in data centers.2026 is anticipated to be a pivotal year for optical networking. Chapters 00:00 Introduction to AI and CPU Bottlenecks 03:00 The Rise of Silicon Photonics 06:01 Understanding Optical Networking and Data Centers 08:49 Scale Across: Connecting Data Centers 11:56 Scale Out: Optimizing Data Center Connectivity 14:53 Scale Up: The Future of GPU Connectivity 23:32 The Shift from Copper to Optical Connections 26:13 Challenges and Reliability of Lasers 30:47 Understanding Co-Packaged Optics 34:17 Market Dynamics: Demand and Supply of Lasers 40:46 Emerging Technologies: Optical Circuit Switches Check out Austin's Substack: https://www.chipstrat.com Check out Vik's Substack: https://www.viksnewsletter.com
In this episode of the Semi Doped podcast, Austin and Vik delve into the current state of the semiconductor industry, focusing on the memory crisis driven by increasing demand from AI applications. They discuss the implications of rising memory prices, the impact of hyperscaler spending on the market, and the strategic moves of major players like Google, Microsoft, Meta, and Amazon in the AI landscape. Takeaways Memory prices are skyrocketing, impacting consumer electronics.The memory crisis is affecting the production of lower-end devices.DRAM prices have doubled in a single quarter, creating challenges for manufacturers.Nanya Tech's revenue growth indicates a booming memory market.AI applications are driving unprecedented demand for memory.Hyperscalers are significantly increasing their capital expenditures for AI infrastructure.The integration of AI into advertising is reshaping business models for companies like Google and Meta.Chapters 00:00 The State of Memory in Semiconductors 03:08 Nvidia's GPU Dilemma and Market Dynamics 06:13 The Impact of AI on Memory Demand 09:08 NAND Flash and Context Memory Trends 11:59 The Future of Memory Supply and Demand 15:12 AI Infrastructure and CapEx Spending 17:47 Google's Strategic Investments in AI 20:58 The Advertising Business Model and AI Integration 30:26 Revenue vs. Expenses: A Balancing Act 31:08 The Future of TPUs vs. GPUs in Cloud Computing 35:31 Microsoft vs. Google: AI Investments and Market Reactions 38:22 AI Integration in Enterprises: Microsoft’s Unique Position 39:57 The Power of Microsoft’s Reach in AI 40:30 GitHub: A Hidden Gem for Microsoft’s AI Strategy 43:52 Meta’s AI Strategy: Advertising and Revenue Growth 51:18 Amazon’s Massive CapEx: Implications for the Future 54:00 Looking Ahead: Predictions for 2027 and Beyond Check out Austin's substack: https://www.chipstrat.com/ Check out Vik's substack: https://www.viksnewsletter.com/
In this episode, Vik and Wayne Nelms discuss the emerging financial exchange for GPU compute, exploring its implications for the AI infrastructure market. They discuss the value of compute, pricing dynamics, hedging strategies, and the future of GPU and memory trading. Wayne shares insights on partnerships, the depreciation of GPUs, and how inference demand may reshape hardware utilization. The conversation highlights the importance of financial products in facilitating data center development and optimizing profitability in the evolving landscape of compute resources. Takeaways Wayne Nelms is the CTO of Ornn, focusing on GPU compute as a commodity.The value of compute is still being defined in the market.Hedging strategies are essential for managing compute costs.The pricing of GPUs varies significantly across providers.Memory trading is becoming a crucial aspect of the compute market.Partnerships can enhance trading platforms and market efficiency.Depreciation of GPUs is not linear and varies by use case.Inference demand may change how GPUs are utilized in the future.Transparency in pricing benefits smaller players in the market.Financial products can facilitate data center development and profitability.Chapters 00:00 Introduction to GPU Compute Futures 03:13 The Value of Compute in Today's Market 05:59 Understanding GPU Pricing Dynamics 08:46 Hedging and Futures in Compute 11:52 The Role of Memory in AI Infrastructure 15:14 Partnerships and Market Expansion 17:46 Depreciation and Residual Value of GPUs 20:57 Future of Data Centers and Compute Demand 24:01 The Impact of Financialization on AI Infrastructure 27:04 Looking Ahead: The Future of Compute Markets Keywords GPU compute, financial exchange, futures market, data centers, AI infrastructure, pricing strategies, hedging, memory trading, Ornn Follow Wayne Nelms (@wayne_nelmz on X) Check out Ornn's website: https://www.ornnai.com/ Check out Vik's Substack: https://www.viksnewsletter.com/ Check out Austin's Substack: https://www.chipstrat.com/
Vik and Val Bercovici discuss the evolution of storage solutions in the context of AI, focusing on Weka's innovative approaches to context memory, high bandwidth flash, and the importance of optimizing GPU usage. Val shares insights from his extensive experience in the storage industry, highlighting the challenges and advancements in memory requirements for AI models, the significance of latency, and the future of storage technologies. Takeaways Context memory is crucial for AI performance.The demand for memory has drastically increased.Latency issues can hinder AI efficiency.High bandwidth flash offers new storage capabilities.Weka's Axon software enhances GPU storage utilization.Token warehouses can significantly reduce costs.Augmented memory grids improve memory access speeds.Networking innovations are essential for AI storage solutions.Understanding memory hierarchies is vital for optimization.The future of storage will involve more advanced technologies.Chapters 00:00 Introduction to Weka and AI Storage Solutions 05:18 The Evolution of Context Memory in AI 09:30 Understanding Memory Hierarchies and Their Impact 16:24 Latency Challenges in Modern Storage Solutions 21:32 The Role of Networking in AI Storage Efficiency 29:42 Dynamic Resource Utilization in AI Networks 30:04 Introducing the Context Memory Network 31:13 High Bandwidth Flash: A Game Changer 32:54 Weka's Neural Mesh and Storage Solutions 35:01 Axon: Transforming GPU Storage into Memory 39:00 Augmented Memory Grid Explained 42:00 Pooling DRAM and CXL Innovations 46:02 Token Warehouses and Inference Economics 52:10 The Future of Storage Innovations Resources Manus AI $2B Blog: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus Also listen to this podcast on your favorite platform. https://www.semidoped.fm/ Check out Vik's Substack: https://www.viksnewsletter.com/ Check out Austin's Substack: https://www.chipstrat.com/
Austin and Vik discuss the emerging trend of AI agents, particularly focusing on Claude Code and OpenClaw, and the resulting hardware implications. Key Takeaways: 2026 is expected to be a pivotal year for AI agents.The rise of agentic AI is moving beyond marketing to practical applications.Claude Code is being used for more than just coding; it aids in research and organization.Integrating AI with tools like Google Drive enhances productivity.Security concerns arise with giving AI agents access to personal data.Local computing options for AI can reduce costs and increase control.AI agents can automate repetitive tasks, freeing up human time for creative work.The demand for CPUs is increasing due to the needs of AI agents.AI can help summarize and organize information but may lack deep insights.The future of AI will involve balancing automation with human oversight.Chapters (00:00) Introduction: Why 2026 may be the year of AI agents (01:12) What people mean by agents and the OpenClaw naming chaos (02:41) Agents behaving badly: crypto losses and social posting (03:38) Claude Code as a research tool, not a coding tool (05:54) Terminal-first workflows vs GUI-based agents (07:44) Connecting Claude Code to Gmail, Drive, and Calendar via MCP (09:12) Token waste, authentication friction, and workflow optimization (10:54) Automating newsletter ingestion and research archives (12:33) Giving agents login credentials and security tradeoffs (13:50) Filtering signal from noise with topic constraints (16:36) AI-driven idea generation and its limitations (17:34) When automation effort is not worth it (19:02) Are agents ready for non-technical users? (20:55) Why OpenClaw should not run on your personal laptop (21:33) Safe agent deployment: VPS vs local servers (23:33) The true cost of agents: infrastructure plus inference (24:18) What OpenClaw adds beyond Claude Code (26:53) Agents require managerial thinking and self-awareness (28:18) Local inference vs cloud APIs (30:46) Cost control with OpenRouter and model hierarchies (32:31) Scaling agents forces model and cost optimization (33:00) AI aggregation vs creator analytics (35:58) AI as discovery, not a replacement for reading (38:17) When summaries are enough and when they are not (39:47) Why AI cannot understand what is not said (41:18) Agentic AI is driving unexpected CPU demand (41:49) Intel caught off guard by CPU shortages (44:53) Security, identity, and encryption shift work to CPUs (46:10) Closing thoughts: agents are real, early, and uneven Deploy your secure OpenClaw instance with DigitalOcean: https://www.digitalocean.com/blog/moltbot-on-digitalocean Visit the podcast website: https://www.semidoped.fm Austin's Substack: https://www.chipstrat.com/ Vik's Substack: https://www.viksnewsletter.com/
Maia 100 was a pre-GPT accelerator. Maia 200 is explicitly post-GPT for large multimodal inference. Saurabh Dighe says if Microsoft were chasing peak performance or trying to span training and inference, Maia would look very different. Higher TDPs. Different tradeoffs. Those paths were pruned early to optimize for one thing: inference price-performance. That focus drives the claim of ~30% better performance per dollar versus the latest hardware in Microsoft’s fleet. Intereting topics include: • What “30% better price-performance” actually means • Who Maia 200 is built for • Why Microsoft bet on inference when designing Maia back in 2022/2023 • Large SRAM + high-capacity HBM • Massive scale-up, no scale-out • On-die NIC integration Maia is a portfolio platform: many internal customers, varied inference profiles, one goal. Lower inference cost at planetary scale. Chapters: (00:00) Introduction (01:00) What Maia 200 is and who it’s for (02:45) Why custom silicon isn’t just a margin play (04:45) Inference as an efficient frontier (06:15) Portfolio thinking and heterogeneous infrastructure (09:00) Designing for LLMs and reasoning models (10:45) Why Maia avoids training workloads (12:00) Betting on inference in 2022–2023, before reasoning models (14:40) Hyperscaler advantage in custom silicon (16:00) Capacity allocation and internal customers (17:45) How third-party customers access Maia (18:30) Software, compilers, and time-to-value (22:30) Measuring success and the Maia 300 roadmap (28:30) What “30% better price-performance” actually means (32:00) Scale-up vs scale-out architecture (35:00) Ethernet and custom transport choices (37:30) On-die NIC integration (40:30) Memory hierarchy: SRAM, HBM, and locality (49:00) Long context and KV cache strategy (51:30) Wrap-up
OpenAI's partnership with Cerebras and Nvidia's announcement of context memory storage raises a fundamental question: as agentic AI demands long sessions with massive context windows, can SRAM-based accelerators designed before the LLM era keep up—or will they converge with GPUs? Key Takeaways 1. Context is the new bottleneck. As agentic workloads demand long sessions with massive codebases, storing and retrieving KV cache efficiently becomes critical. 2. There's no one-size-fits-all. Sachin Khatti's (OpenAI, ex-Intel) signals a shift toward heterogeneous compute—matching specific accelerators to specific workloads. 3. Cerebras has 44GB of SRAM per wafer — orders of magnitude more than typical chips — but the question remains: where does the KV cache go for long context? 4. Pre-GPT accelerators may converge toward GPUs. If they need to add HBM or external memory for long context, some of their differentiation erodes. 5. Post-GPT accelerators (Etched, MatX) are the ones to watch. Designed specifically for transformer inference, they may solve the KV cache problem from first principles. Chapters - 00:00 — Intro - 01:20 — What is context memory storage? - 03:30 — When Claude runs out of context - 06:00 — Tokens, attention, and the KV cache explained - 09:07 — The AI memory hierarchy: HBM → DRAM → SSD → network storage - 12:53 — Nvidia's G1/G2/G3 tiers and the missing G0 (SRAM) - 14:35 — Bluefield DPUs and GPU Direct Storage - 15:53 — Token economics: cache hits vs misses - 20:03 — OpenAI + Cerebras: 750 megawatts for faster Codex - 21:29 — Why Cerebras built a wafer-scale engine - 25:07 — 44GB SRAM and running Llama 70B on four wafers - 25:55 — Sachin Khatti on heterogeneous compute strategy - 31:43 — The big question: where does Cerebras store KV cache? - 34:11 — If SRAM offloads to HBM, does it lose its edge? - 35:40 — Pre-GPT vs Post-GPT accelerators - 36:51 — Etched raises $500M at $5B valuation - 38:48 — Wrap up
Innoviz CEO Omer Keilaf believes the LIDAR market is down to its final players—and that Innoviz has already won its seat. In this conversation, we cover the Level 4 gold rush sparked by Waymo, why stalled Level 3 programs are suddenly accelerating, the technical moat that separates L4-grade LIDAR from everything else, how a one-year-old startup won BMW, and why Keilaf thinks his competitors are already out of the race. Omer Keilaf founded Innoviz in 2016. Today it's a publicly traded Tier 1 supplier to BMW, Volkswagen, Daimler Truck, and other global OEMs. Chapters 00:00 Introduction 00:17 Why Start a LIDAR Company in 2016? 01:32 The Personal Story Behind Innoviz 03:12 Transportation Is Still Our Biggest Daily Risk 04:28 The 2012 Spark: Xbox Kinect and 3D Sensing 06:32 From Mobile to Automotive: Finding the Right Platform 07:54 "I Didn't Know What LIDAR Was, But I'd Do It Better" 08:19 How a One-Year-Old Startup Won BMW 10:04 Surviving the First Product 11:23 From Tier 2 to Tier 1: The Volkswagen Win 13:47 Lessons Learned Scaling Through Partners 14:45 The SPAC Decision: A Wake-Up Call from a Competitor 16:42 From 200 LIDAR Companies to a Handful 17:27 NREs: How Tier 1 Status Funds R&D 18:44 Why Automotive-First Is the Right Strategy 19:45 Consolidation Patterns: Cameras, Radars, Airbags 20:31 "The Music Has Stopped" 21:07 Non-Automotive: Underserved Markets 23:51 Working with Secretive OEMs 25:27 The Press Release They Tried to Stop 26:42 CES 2025: 85% of Meetings Were Level 4 27:40 Why Level 3 Programs Are Suddenly Accelerating 28:33 The EV/ADAS Coupling Problem 29:49 Design Is Everything: The Holy Grail Is Behind the Windshield 31:13 The Three-Year RFQ: Grill → Roof → Windshield 32:32 Innoviz3: Small Enough for Behind-the-Windshield 34:40 Innoviz2 for L4, Innoviz3 for Consumer L3 36:38 What's the Real Difference Between L2, L3, and L4 LIDAR? 38:51 The Mud Test: Why L4 Demands 100% Availability 40:50 "We're the Only LIDAR Designed for Level 4" 42:52 Patents and the Maslow Pyramid of Autonomy 44:15 Non-Automotive Markets: Agriculture, Mining, Security 46:15 Closing
Austin and Vik discuss why LiDAR is important for autonomy, how modern systems work, and how the technology has evolved. They compare Time of Flight and FMCW architectures, explain why wavelength choice matters, and walk through the tradeoffs between 905 nm and 1550 nm across eye safety, cost, and performance. The discussion closes with a clear-eyed look at competition, Chinese suppliers, and supply chain risk. Chapters (00:00) Introduction to LiDAR and why it matters (05:40) The case for LiDAR in autonomous vehicles (12:41) Wavelengths, eye safety, and system tradeoffs (15:38) How LiDAR works: Time of Flight vs. FMCW (20:12) Mechanical vs. solid-state LiDAR designs (27:31) Market dynamics, competition, and geopolitics
Episode Summary Austin and Vik break down NVIDIA’s CES 2026 keynote, focusing on Vera Rubin, DGX Spark and DGX Station, uneducated investor panic, and physical AI. Key Takeaways DGX Spark brings server-class NVIDIA architecture to the desktop at low power, aimed at developers, enthusiasts, and enterprises experimenting locally. DGX Station functions more like a mini-AI rack on-prem: Grace Blackwell for inference and development without full racks The historical parallel is mainframes to minicomputers, expanding compute TAM rather than displacing cloud usage. On-prem AI converts some GPU rental OpEx into CapEx, appealing to CFOs NVIDIA positioned autonomy as physical AI with vision-language-action models and early Mercedes-Benz deployments in 2026. Vera Rubin integrates CPU, GPU, DPU, networking, and photonics into a single platform, emphasizing Ethernet for scale-out. (Where was the Infiniband switch?) The new Vera CPU highlights rising CPU importance for agentic workloads through higher core counts, SMT, and large LPDDR capacity. Rubin GPU’s move to HBM4 and adaptive precision targets inference efficiency gains and lower cost per token. Context memory storage elevates SSDs and DPUs, enabling massive KV cache offload beyond HBM and DRAM. Cable-less rack design and warm-water cooling show NVIDIA’s shift from raw performance toward manufacturability and enterprise polish.
Austin and Vik discuss key insights from the IEDM conference. They explore the significance of IEDM for engineers and investors, the networking opportunities it offers, and the latest innovations in silicon photonics, complementary FETs, NAND flash memory, and GaN-on-silicon chiplets. Takeaways Penta-level NAND flash memory could disrupt the SSD marketGaN-on-Silicon chiplets enhance power efficiencyComplementary FETsOptical scale-up has a power problemThe future of transistors is still bright
Key Topics What Nvidia actually bought from Groq and why it is not a traditional acquisitionWhy the deal triggered claims that GPUs and HBM are obsoleteArchitectural trade-offs between GPUs, TPUs, XPUs, and LPUsSRAM vs HBM. Speed, capacity, cost, and supply chain realitiesGroq LPU fundamentals: VLIW, compiler-scheduled execution, determinism, ultra-low latencyWhy LPUs struggle with large models and where they excel insteadPractical use cases for hyper-low-latency inference:Ad copy personalization at search latency budgetsModel routing and agent orchestrationConversational interfaces and real-time translationRobotics and physical AI at the edgePotential applications in AI-RAN and telecom infrastructureMemory as a design spectrum: SRAM-only, SRAM plus DDR, SRAM plus HBMNvidia’s growing portfolio approach to inference hardware rather than one-size-fits-all Core Takeaways GPUs are not dead. HBM is not dead.LPUs solve a different problem: deterministic, ultra-low-latency inference for small models.Large frontier models still require HBM-based systems.Nvidia’s move expands its inference portfolio surface area rather than replacing GPUs.The future of AI infrastructure is workload-specific optimization and TCO-driven deployment.