Watch Oracle Cloud World Video on Performant AI Networks

Why I Joined Clockwork: Building the future of AI infrastructure

After six transformative years at Sysdig – my third startup over the last 17 years following a decade-long journey at NetApp that I thoroughly loved – my plan was to take a break from full-time roles. I’d envisioned spending time on boards, mentoring startups, and exploring if I really could become a teacher – something I always felt I wanted to do.

That plan did not last long, and I blame Balaji Prabhakar and Clockwork.io for that!

Falling in love with elegant innovation: Software-Driven Fabric Magic!

What first drew me to Clockwork was the sheer elegance of how the team had solved an age-old problem that has plagued distributed systems for decades: synchronizing clocks to within tens of nanoseconds – entirely through software. No proprietary hardware. No expensive infrastructure overhauls. Just pure software innovation born from years of research at Stanford by Balaji’s PhD student, Yilong Geng.

As I spent time with Balaji, Yilong and the other members of the founding team, Deepak Merugu and Dan Zheng – first as a board member and then over the last five months as CEO – a much larger vision crystallized.

AI workloads are the most demanding and distributed workloads in history, and cluster communication gates performance and efficiency – communication is the new Moore’s law in AI infrastructure! Real-world GPU clusters achieve only 30-55% of their theoretical performance, and the bottleneck is communication – between GPUs, across clusters, between clouds – when thousands of accelerators must stay perfectly synchronized, even one lagging link forces entire jobs to pause.

Clockwork’s answer is a Software‑Driven Fabric (SDF): treat the fabric as a programmable, software control plane that (i) provides microsecond‑granular, one‑way‑delay visibility from the physical network through the communication libraries and into the job, and (ii) applies path‑aware, closed‑loop control so that small faults do not crash jobs, and congestion and contention does not slow AI innovation. Because it’s software, it runs anywhere – any accelerator, any network, any cloud – No proprietary hardware lock‑in.

My Passion and pursuit is category creation

I was an early employee at NetApp and left after 10 years as head of products and engineering – I witnessed firsthand the excitement of shaping NAS into a category and scaling a company from startup to industry leader.

That excitement of category creation is something I have pursued at every startup since NetApp. Omneon became the market leader for transmission video servers by leveraging commodity servers into specialized systems for real-time video. At Nimble Storage, we pioneered predictive analytics and leveraged flash-optimized file systems to take the company from inception through IPO to an acquisition by HPE. At Sysdig, we defined runtime security for Kubernetes, growing revenue twentyfold.

I see Software-Driven Fabrics as the next fundamental category that will define how AI workloads operate. This isn’t incremental improvement – it’s a paradigm shift.

Customer validation that speaks volumes

We started proof-of-concepts of our FleetIQ platform for AI in March and it became GA in August, laying the foundation for our product launch today. During these short 5 months, we completed 15 proof-of-value engagements, and the results have been extraordinary:

  • Uber is rolling out Clockwork across their infrastructure, citing our ability to expedite detection and fault-localization from hours to minutes, and significantly improving service tail latency.
  • DCAI, operator of Denmark’s flagship AI supercomputer Gefion, is leveraging our platform to deliver resilient, reliable infrastructure for quantum computing, drug discovery, and advanced weather forecasting – “lowering costs, eliminating wasted GPU cycles, and helping deliver a sovereign AI capability second to none.”
  • Two of Europe’s leading neoclouds are embedding Clockwork as part of their customer GPU clusters to improve mean time between failures and to ensure non-stop AI training in the face of network failures.
  • A global hyperscaler is rolling us out across their cloud for fleet monitoring and resilience – across both front-end and back-end networks.
  • A  global leader in communications software is leveraging us to accelerate training of their AI models that power meeting insights and more intelligent meeting experiences.
    A global social network is accelerating their AI initiatives for content discovery and professional connections and making the network more intelligent and effective for their massive user base
  • Building organizations and shaping culture

Throughout my career, I’ve been passionate about building organizations where people thrive. NetApp has frequently been listed on the “Fortune 100 Best Companies to Work For”. Nimble Storage was named to the “Silicon Valley Business Journal’s Best Places to Work” list and Sysdig was certified as a “Great Place to Work”. I’m still deeply moved when former colleagues tell me their time at these companies was among their best working experiences – that’s the ultimate compliment, not just for me, but for the entire leadership teams we built together. At Clockwork, I look forward to continuing this tradition. We’re building a company where brilliant minds can do their best work, where innovation is celebrated, and where we carry others with us as we win.

Looking ahead

I believe AI is the most transformative force in technology in my lifetime. The AI infrastructure market is at an inflection point. Countries are building sovereign AI capabilities. Enterprises are moving from AI experiments to production deployments. The cost of inefficiency is becoming untenable. And the need for reliable, performant, economically sustainable AI infrastructure has never been more critical.
I wanted to spend the next chapter building something that addresses this need. We’re not just building a product or a company – we’re pioneering a new category that will fundamentally change how enterprises run AI at scale. With our remarkable team, breakthrough technology, and the trust of industry leaders, Clockwork is uniquely positioned to lead this transformation. We’re making AI infrastructure work the way it should – fast, efficient, and fault-tolerant, on any hardware, at any scale. That is the essence of a Software-Driven Fabric.

To the Clockwork team

Thank you for welcoming me on this incredible journey. Together, we’re not just optimizing infrastructure – we’re accelerating the future of intelligence.