AI testing, benchmarks and evals

Thoughtworks Technology Podcast

Player FM - Internet Radio Done Right

642 subscribers

اضافه شده در seven سال پیش

محتوای ارائه شده توسط Thoughtworks. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Thoughtworks یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

Insights Unlocked

1
Build your Plan B: How content entrepreneurship can unlock freedom and meaning 43:52

10 روز پیش43:52

پخش در آینده

لیست ها

پسندیدن

دوست داشته شد

43:52

Episode web page: https://bit.ly/46shU5A ----------------------- Episode summary: Joe Pulizzi—bestselling author, founder of the Content Marketing Institute, and creator of the Content Inc model—joins host Nathan Isaacs to talk about redefining work, meaning, and success in the age of AI and economic uncertainty. From his transition after a successful business exit to founding The Tilt and launching Creator Economy Expo , Joe shares how content creation can unlock personal freedom and lasting impact. He discusses themes from his latest book, Burn the Playbook , and why now is the time for marketers and creators to lean into their unique edge—what he calls your “tilt.” What you’ll learn in this episode: Why marketers have all the skills needed to become entrepreneurs How to find and refine your “tilt” to stand out in a crowded digital space The difference between passion and monetizable expertise Why building an owned audience is more sustainable than relying on social platforms How to create meaningful work that aligns with your life goals—not just your job title Why Joe believes the best time to start is always right now Memorable quote: “Create before you consume. If you do that every day for 30 days, it will change your life.” — Joe Pulizzi Resources & Links: JoePulizzi.com ( https://www.joepulizzi.com/ ) Burn the Playbook ( https://www.joepulizzi.com/books/burn-the-playbook/ ) The Tilt newsletter (https://www.joepulizzi.com/signup/) [Creator Economy Expo (CEX)](https://www.creatoreconomyexpo.com) This Old Marketing podcast (https://www.thisoldmarketing.com/) Content Inc podcast Nathan Isaacs on LinkedIn (https://www.linkedin.com/in/nathanisaacs/) Learn more about Insights Unlocked: https://www.usertesting.com/podcast…

حدود یک سال پیش 36:03

MP3•خانه قسمت

Generative AI's popularity has led to a renewed interest in quality assurance — perhaps unsurprising given the inherent unpredictability of the technology. This is why, over the last year, the field has seen a number of techniques and approaches emerge, including evals, benchmarking and guardrails. While these terms all refer to different things, grouped together they all aim to improve the reliability and accuracy of generative AI.

To discuss these techniques and the renewed enthusiasm for testing across the industry, host Lilly Ryan is joined by Shayan Mohanty, Head of AI Research at Thoughtworks, and John Singleton, Program Manager for Thoughtworks' AI Lab. They discuss the differences between evals, benchmarking and testing and explore both what they mean for businesses venturing into generative AI and how they can be implemented effectively.

Learn more about evals, benchmarks and testing in this blog post by Shayan and John (written with Parag Mahajani): https://www.thoughtworks.com/insights/blog/generative-ai/LLM-benchmarks,-evals,-and-tests

223 قسمت

#User Experience #Software Development #Business #Technology #Thoughtworks #Tech #Careers

AI testing, benchmarks and evals

Thoughtworks Technology Podcast

642 subscribers

published حدود یک سال پیش

اشتراک گذاری

MP3•خانه قسمت

223 قسمت

#User Experience #Software Development #Business #Technology #Thoughtworks #Tech #Careers

همه قسمت ها

1
What we're talking about when we talk about context engineering 20:11

۷ روز پیش20:11

20:11

Everyone seems to be talking about context engineering. That was certainly the case in our recent discussions for the upcoming edition of the Technology Radar (volume 33, due early November 2025). And although we ran into the term on the Technology Podcast just a few weeks ago , we thought it would be useful to try and tackle exactly what people are talking about when they talk about context engineering. We know context is important when it comes to AI, but what does it mean to engineer it? On this episode of the Technology Podcast, host and Thoughtworks CTO Rachel Laycock is joined by Thoughtworkers Alessio Ferri (Lead Software Engineer) and Bharani Subramaniam (CTO for India and the Middle East) to discuss what context engineering is, how it's being done and what it tells us about the evolution of AI. This certainly won't be the last word — ours or anyone else's — on context engineering, but it might help clarify and cement your understanding as the term comes to dominate technology conversations.…

1
Mean time to shared understanding: Bridging the gap between citizen developers and developers 28:24

21 روز پیش28:24

28:24

Although the concept of the 'citizen developer' isn't new, with the rise of AI the relationship between those building software without much technical experience and seasoned software developers is becoming more significant. That's not to say there's conflict exactly, but there are often competing interests and demands — which can lead to tension, organizational friction and governance challenges. On this episode of the Technology Podcast, host Ken Mugrage facilitates a debate (of sorts) between Christopher Hastings, Global Tech Product Lead at Thoughtworks (and citizen developer) and Scott Davies, Head of Technology for Thoughtworks Europe (very much in the developer camp). They discuss the needs and interests of both sides, how to avoid regressing to the dark ages of shadow IT and how citizen developers can be properly empowered by engineering teams.…

1
Organizational design and Team Topologies after AI 42:45

۵ weeks پیش42:45

42:45

Managing technological change in an organization — particularly a large and complex one — has always been challenging. But thanks to the rapid adoption of AI in all kinds of spheres, from knowledge management to software development to content creation, it's becoming more difficult than ever. How do you strike a balance between governance and safety and autonomy and empowerment? How should teams be structured and how should they work together? In this episode of the Technology Podcast, Matthew Skelton and Manuel Pais — authors of the influential Team Topologies book — join hosts Birgitta Böckeler and Ken Mugrage to discuss what AI means for organizational design. They discuss how AI is changing team capabilities, what it means for cognitive load and knowledge sharing and how to ensure there's structure and control without constraining experimentation and creativity. With the second edition of Team Topologies set to be published in September 2025, Matthew and Manuel used the conversation to explore the evolution of their ideas and what they've learned from working with and listening to the stories of many different organizations around the world. Learn more about Team Topologies : https://teamtopologies.com/…

1
Context engineering: Tackling legacy systems with generative AI 40:41

۷ weeks پیش40:41

40:41

Generative AI can be incredibly powerful when it comes to legacy modernization. Not only can it help us better understand a large, aging codebase, it can even help us reverse engineer a legacy system when we don't have access to the complete source code. Doing it, though, requires a specific approach that's being described as 'context engineering'. This is something we've been exploring a lot in recent months at Thoughtworks. On this episode of the Technology Podcast, Thoughtworks' lead for AI-enabled software engineering, Birgitta Böckeler, and tech principal Chandirasekar Thiagarajan join hosts Ken Mugrage and Neal Ford to discuss how it works. They explain the process, the tools and what the work is teaching them about both generative AI and legacy modernization. Read Birgitta's blog post on reverse engineering with AI: https://www.thoughtworks.com/insights/blog/generative-ai/blackbox-reverse-engineering-ai-rebuild-application-without-accessing-code…

1
Navigating AI opportunities at MYOB 57:23

۹ weeks پیش57:23

57:23

How should businesses go about actually navigating AI? It's one thing to strategize and generate new ideas, but what needs to be done to put it into practice in a way that's effective and commercially impactful? In this episode of the Technology Podcast, new host Nigel Dalton is joined by his Thoughtworks colleague May Xu — Head of Tech for Thoughtworks APAC — and Simon Noonan, CTO at Australian business software company MYOB. Thoughtworks has been working closely with MYOB for a number of years now; May and Simon explain how they collaborate and offer their perspectives on everything from leadership to architecture in a world where AI has become imperative. Learn more about Thoughtworks' partnership with MYOB: https://www.thoughtworks.com/clients/myob…

1
Caring about documentation in the LLM era (w/ Heidi Waterhouse) 39:19

11 weeks پیش39:19

39:19

In an age of vibe coding and LLMs, do we really need to care about documentation? Do we need to spend time and energy producing it — time when we could just be shipping code? Of course we do; particularly if we want to communicate and share software with other humans. To discuss documentation in 2025, Technology Podcast host Lilly Ryan is joined by Heidi Waterhouse, a very special guest with an esteemed and varied career in technical communcation. In this episode, Lilly and Heidi tackle the challenges of documentation in a world increasingly infused with AI-generated code and text, explore whether prompt engineering is really just technical writing in disguise and examine the difficulties of writing for highly specific audiences. They also cover Heidi's Progressive Delivery, an upcoming book about bridging the gap between software delivery and business value. It's due to be released in the latter part of 2025 and written alongside James Governor, Kim Harrison and Adam Zimman. Find out more about Heidi Waterhouse by visiting her website: https://heidiwaterhouse.com/ Learn more about Progressive Delivery : https://itrevolution.com/product/progressive-delivery/…

1
Why the tech industry needs Expert Generalists (w/ Martin Fowler) 43:21

13 weeks پیش43:21

43:21

The technology industry has embraced specialisms — not just in different fields or job roles, like web development or security, but even in terms of particular platforms or stacks. But are we losing something as every tech professional is forced to push themselves into increasingly smaller niches? Martin Fowler and Unmesh Joshi think so. They've been thinking a lot about the importance of what they call "Expert Generalists" — professionals who "can dissect unfamiliar challenges, spot first-principles patterns and make confident design decisions with the assurance of a specialist." In this episode of the Technology Podcast, Martin and Unmesh join hosts Prem Chandrasekaran and Lilly Ryan to discuss how they came to identify the importance of expert generalists and why it was important to not just talk about the issue, but to explicitly name it. They also explore how they believe the industry can cultivate and encourage expert generalists, despite an entrenched tendency to overlook their value. Read Martin and Unmesh's article, written with Gitanjali Venkatraman: https://martinfowler.com/articles/expert-generalist…

1
The three new fallacies of distributed computing 46:56

15 weeks پیش46:56

46:56

Back in 1994, Peter Deutsch and his colleagues at Sun Microsystems identified what they described as the "eight fallacies of distributed computing" — flawed assumptions that often get made when teams move from monolithic to distributed software architectures. In recent years, software architecture experts and regular writing partners Neal Ford and Mark Richards have identified a further three new fallacies of distributed computing: versioning is easy ; compensating updates always work ; and observability is optional . In this episode of the Technology Podcast, Neal and Mark join host Prem Chandrasekaran to talk through these three new fallacies, before digging deeper into other important issues in software architecture, including modular monoliths and governing architectural characteristics. Listen for a fresh perspective on software architecture and to explore key ideas shaping the discipline in 2025. Learn more about the second edition of Neal and Mark's Fundamentals of Software Architecture : https://www.oreilly.com/library/view/fundamentals-of-software/9781098175504/…

1
MCP and SRE: Why the future of IT operations is agent-driven 28:33

17 weeks پیش28:33

28:33

What if your AI agents could think more like IT operations staff — and less like tools? In this episode, we catch up with Zichuan Xiong, to explore the Model Context Protocol (MCP) — a powerful new way to give AI agents deeper awareness of the tools, information and history they need to work effectively in the operations space. Unlike traditional APIs that just trigger functions, MCP adds a semantic layer of context that helps AI understand what to do, why it matters and how to do it better. Whether you’re deep in site reliability engineering (SRE) or just curious about the next leap in AIOps, this episode unpacks how MCP could be the missing layer between today’s tools and tomorrow’s autonomous systems. If you want to find out more, check out this piece by Zichuan at al, https://www.thoughtworks.com/insights/blog/machine-learning-and-ai/mcp-critical-ai-driven-sre…

1
Unpacking Google I/O 2025 29:51

19 weeks پیش29:51

29:51

Google I/O 2025 took place in May. It's always a great opportunity to find out how Google is trying to shape the industry agenda, but this year the predominance of Gemini meant the event was a chance to get a better look at how Google will play its hand in the AI market in the months to come. To dissect the headlines from this year's Google I/O and explore what we can learn about Google's strategic focus — and how the company is thinking about AI — host Ken Mugrage is joined by Andy Yates on the Technology Podcast. As Head of Ecosystems Development at Thoughtworks, Andy plays an important role in helping the organization and its clients undertstand, analyze and engage with the major platforms and vendors. This edition of Google I/O, he explains, was significant and particularly useful for helping us understand how the world is going to be consuming AI products and services as the technology becomes more and more embedded in the mainstream. Read more of Andy's perspective on Google I/O 2025 on the Thoughtworks blog: https://www.thoughtworks.com/insights/blog/technology-strategy/google-io-2025-key-takeaways…

1
Accelerating mainframe modernization using generative AI 38:28

21 weeks پیش38:28

38:28

Mainframe modernization is hard : there's a huge amount of complexity that needs to be understood before it can be effectively addressed. Generative AI, however, can be a particularly powerful tool for understanding mainframe legacy codebases, something we've been exploring with Mechanical Orchard while working together on its Imogen modernization platform. In this episode of the Technology Podcast, hosts Ken Mugrage and Alexey Boas are joined by Thoughtworks CTO Rachel Laycock and Mechanical Orchard CEO and Founder Rob Mee to discuss the partnership between the two organizations. They discuss how the collaboration began, the challenges of leveraging generative AI tools for such risky projects and what the wider implications are for AI in software engineering. Listen for a fresh perspective on both legacy modernization and generative AI. Learn more about Thoughtworks' partnership with Mechanical Orchard: https://www.thoughtworks.com/about-us/partnerships/technology/mechanical-orchard Read more about our work on mainframe modernization: https://www.thoughtworks.com/insights/blog/rewriting-the-outcomes--how-thoughtworks-and-mechanical-orchard-…

1
Exploring the fundamentals of software engineering 27:51

23 weeks پیش27:51

27:51

You might think you know software engineering, but what are the really fundamental elements? What are the concepts, ideas and practices that are completely essential? What makes software engineering what it is? Thoughtworker Nate Schutta and Dan Vega are attempting to address those questions in their upcoming book with O'Reilly, The Fundamentals of Software Engineering. Covering topics ranging from reading code through to the importance of learning to learn, it promises to offer a fresh insight into the skills and knowledge needed to be a successful software engineer. In this episode of the Thoughtworks Technology Podcast, Nate and Dan join hosts Neal Ford and Ken Mugrage to discuss the book and to dive into what really are the fundamental elements of software engineering. Listen for a fresh perspective on the discipline and a deep dive that shows it's about far more than just writing code. Learn more about The Fundamentals of Software Engineering : https://www.oreilly.com/library/view/fundamentals-of-software/9781098143220/…

1
Themes in Technology Radar Vol.32 38:41

25 weeks پیش38:41

38:41

Thoughtworks Technology Radar Vol.32 was published at the start of April 2025. Featuring 105 blips, it offered a timely snapshot of what's interesting and important in the industry. Through the process of putting it together, we also identify a collection of key themes that speak to the things that shaped our conversations. This time, there were four: supervised agents in coding assistants, evolving observability, the R in RAG and taming the data frontier. We think they point to some of the key challenges and issues that industry as a whole is currently grappling with. To dig deeper and explore what they tell us about software in 2025, regular host Neal Ford takes the guest seat alongside Birgitta Böckeler to talk to Lilly Ryan and Prem Chandrasekaran. They explain how the themes are identified and discuss their wider implications. Read the latest volume of the Thoughtworks Technology Radar: https://www.thoughtworks.com/radar…

1
We need to talk about vibe coding 36:53

27 weeks پیش36:53

36:53

The term 'vibe coding' — which first appeared in a post on X by Andrej Karpathy in early February 2025 — has set the software development world abuzz: everyone seems to have their own take on what it is, how it's done and whether it's a bold new chapter in the history of programming or an insult to anyone that's ever written a line of code. Clearly, then, we need to talk about vibe coding — and that's precisely what we do on this episode of the Technology Podcast. Featuring Thoughtworkers Birgitta Böckeler (AI for Software Delivery Lead) and Lilly Ryan (Cybersecurity Principal), who join hosts Neal Ford and Prem Chandrasekaran, we dive into the different understandings and applications of the concept, and discuss what happens when a meme collides with reality.…

1
Infrastructure as code in 2025 29:07

29 weeks پیش29:07

29:07

Nearly ten years after the first edition of Infrastructure as Code was published by O'Reilly, Kief Morris is publishing a third edition of the book. But why a new edition now? What's changed in technology and business over the last decade? Quite a lot, as it happens. To talk about what's new — both in the infrastructure world and in the book itself — Kief Morris joins host Ken Mugrage on the Technology Podcast. They discuss each edition and what's new in this one, and dive into the infrastructure challenges and issues that need to be tackled in 2025, from tooling and deployment to maintenance and infrastructure evolution. Learn more about Infrastructure as Code, Third Edition: https://www.thoughtworks.com/en-gb/insights/books/infrastructure-as-code-3rd-ed…

به Player FM خوش آمدید!

Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.