Datacast
Datacast
Episode 131: Data Infrastructure for Consumer Platforms, Algorithmic Governance, and Responsible AI with Krishna Gade
0:00
Current time: 0:00 / Total time: -1:21:27
-1:21:27

Paid episode

The full episode is only available to paid subscribers of Datacast

Episode 131: Data Infrastructure for Consumer Platforms, Algorithmic Governance, and Responsible AI with Krishna Gade

Krishna Gade is the founder and CEO of Fiddler AI, an AI Observability startup that helps AI-forward organizations build trusted AI solutions and connect model outcomes to business KPIs.

The 131st episode of Datacast is my conversation with Krishna Gade, the founder and CEO of Fiddler AI, an AI Observability startup that helps AI-forward organizations build trusted AI solutions and connect model outcomes to business KPIs.

Our wide-ranging conversation touches on his research on document clustering in grad school, his early career working on the Bing’s Search Engine at Microsoft, his time as an engineering leader at Twitter and Pinterest scaling their data engineering, his experience at Facebook building the News Feed ranking platform, the founding story of Fiddler AI and the Model Performance Management framework, model governance for modern enterprises, lessons learned from hiring/finding design partners/fundraising, and much more.

Please enjoy my conversation with Krishna!

Share Datacast

Show Notes

Krishna's Contact Info

Fiddler's Resources

Mentioned Content

People

  1. Goku Mohamandas (Made With ML and Anyscale)

  2. Krishnaram Kenthapadi (Chief AI Officer & Chief Scientist at Fiddler)

Books

  1. "The Hard Thing About Hard Things" (Ben Horowitz)

  2. "The Five Dysfunctions of A Team" (Patrick Lencioni)

Notes

My conversation with Krishna was recorded more than a year ago. Since then, I'd recommend checking out these Fiddler's resources:

  1. Strategic investments in Fiddler by Alteryx Ventures, Mozilla Ventures, Dentsu Ventures, and Scale Asia Ventures.

  2. Fiddler introduces an end-to-end workflow for robust Generative AI back in May 2023.

  3. Krishna's thought leadership on LLMOps and the missing link in Generative AI.

Datacast is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Leave a comment

Key Takeaways

Here are the highlights from my conversation with Krishna:

On His Educational Background

Like many Indian students who come to America, I completed my undergraduate studies in the late nineties and then pursued graduate school in the U.S. One area that particularly interested me was data mining.

During the late nineties and early 2000s, significant research was conducted in association rule mining. One of the seminal papers in this field focused on analyzing transaction data from Walmart and uncovering exciting patterns.

I vividly remember one of the most intriguing discoveries from that research, which involved a humorous example. The researchers found a strong correlation between purchases of beer and diapers, which initially puzzled them. They wondered why such a high correlation existed between these two items. It turned out that many fathers, on their way back home, would stop by Walmart to buy beer for themselves and diapers for their children. This finding was quite fascinating.

As a result, Walmart rearranged its aisles based on this insight to help customers find items more efficiently. These techniques have many applications, especially in web and text mining. I focused extensively on document clustering, which was crucial in the early stages of search engines like Google, Yahoo, and Bing. Teams were crawling the web and striving to make sense of the vast amount of textual data available. I dedicated a significant amount of time to developing clustering algorithms that could effectively analyze these large text datasets.

The University of Minnesota has an excellent computer science program. I was fortunate to work with two professors who were pioneers in data mining and graph partitioning: Dr. Vipin Kumar, who is still at the university, and George Karypis, who is currently on hiatus and working at Amazon. They conducted groundbreaking research in bioinformatics, data mining, text mining, and climate modeling. Working with them gave me valuable opportunities to collaborate, publish papers, and gain experience in these fields. Overall, it was an incredible experience.

On Working on Bing's search engine at Microsoft

After working in grad school on data mining-related areas, I wanted to work on building a web search system. Microsoft had a small team of 25 people dedicated to this project, which was like a startup within the company. The goal was to create a search engine to compete with Google.

Joining this team was an exciting opportunity for me as we started from scratch. We were all new and figuring things out together. During that time, everything was new. Microsoft used Linux servers for their data centers, including Hotmail. Bing, one of the first software systems built on clusters of Windows 2003, was also part of this infrastructure. Eventually, this infrastructure was repurposed for Azure.

Working on Bing, I was part of the search quality team, which allowed me to contribute to groundbreaking work. We focused on building page rank and graph-based algorithms to determine the importance of web pages. Additionally, I had the opportunity to work on Bing's first autocomplete feature, which provided quick suggestions for users when they used the search box.

We also worked on innovative projects, such as returning search traffic to Bing. We had many MSN pages and properties and wanted to run contextual ads similar to AdSense. We called it SearchSense, where we displayed relevant ads for search queries on Bing. For example, if you were on an MSN page related to Michael Jordan, you could see a query that would bring you back to Bing search.

Working on search engines allowed me to explore various aspects of computer science, including distributed systems, machine learning, and algorithm development. It was a valuable experience that laid the foundation for my career. I spent around five to six years at Bing and cherish the knowledge and skills I gained.

On Competing Against Google

Google's biggest advantage over Bing was the amount of data they had. This was because they already had a toolbar launched on various browsers, allowing them to collect a large amount of long-tail query data.

In the case of search engines, the more you use them, the better they can become because they can gather more data. At Bing, we developed highly sophisticated algorithms, including one of the first neural networks for a large-scale use case.

Before that, we also productized an algorithm called RankNet, a two-layer neural network used to score search results on web pages. We had a team of Microsoft researchers working on these problems. However, Google had better data due to their larger amount of long-tail query data. This made it challenging for Bing to match or surpass Google regarding search quality.

We constantly compared our performance to Google using a search quality metric called NDCG (Normalized Discounted Cumulative Gain). We aimed to see how Bing performed compared to Google for the same queries.

On Scaling Twitter Search

Listen to this episode with a 7-day free trial

Subscribe to Datacast to listen to this post and get 7 days of free access to the full post archives.

Datacast
Datacast
Datacast follows the narrative journey of data practitioners and researchers to unpack the career lessons they learned along the way. James Le hosts the show.