logo
episode-header-image
Oct 23
26m 40s

Benchmarking Legal AI: Measuring the Del...

Percipient - Chad Main
About this episode

Is artificial intelligence custom-made for legal tasks better than general AI tools like Google Gemini and ChatGPT? That is the topic of this episode featuring Legalbenchmarks.ai Founder Anna Guo. Anna is a former BigLaw lawyer who left the practice to become an entrepreneur and now focuses her energies on quantifying the utility of AI in the legal industry. Anna's initial anecdotal research for colleagues quickly revealed a strong community interest in a systematic approach to evaluating legal AI tools. This led to the creation of Legalbenchmarks.AI, dedicated to finding out where the promise of humans plus AI is truly better than humans alone or AI alone.

The core of the research involves measuring the "delta," or the extent to which AI can elevate human performance. To date, Legalbenchmarks.ai conducted two major studies: one on information extraction from legal sources and a second on contract review and redlining.

Key Findings from the Studies:

  • Accuracy vs. Qualitative Usefulness: The highest-performing general-purpose AI tools (like Gemini) were often found to be more accurate and consistent. However, the legal-specific AI tools often received higher marks in qualitative usefulness and helpfulness, as they align more closely with existing legal workflows.

  • Methodology: The testing goes beyond simple accuracy. It includes a three-part assessment: Reliability (objective accuracy and legal adequacy), Usability (qualitative metrics like helpfulness and coherence for tasks such as brainstorming), and Platform Workflow Support (integration, citation checks, and other features).

  • Human-AI Performance: In the contract analysis study, AI tools matched or exceeded the human baseline for reliability in producing first drafts. Crucially, the data demonstrated that the common belief that "human plus AI will always outperform AI alone" was false; the top-performing AI tool alone still had a higher accuracy rate than the human-plus-AI combo.

  • Risk Analysis: A significant finding was that legal AI tools were better at flagging material risks, such as compliance or unenforceability issues in high-risk scenarios, that human lawyers missed entirely. This suggests AI can act as a crucial safety net.

  • Strengths Comparison: AI excels at brainstorming, challenging human bias, and performing mass-scale routine tasks (e.g., mass contract review for simple terms). Humans retain a significant edge in ingesting nuanced context and making commercially reasonable decisions that AI's instruction-following can sometimes lack.

Discussion Highlights:

  • [0:00] – Introduction and background of Anna Guo and Legal Benchmarks AI.

  • [4:30] – The impetus for starting systematic AI benchmarking.

  • [6:00] – Explaining the concept of measuring the "delta" in performance.

  • [9:00] – Detailed breakdown of the three-part AI assessment methodology.

  • [15:00] – Discussion of the contrasting results: general LLM accuracy vs. legal AI qualitative value.

  • [19:00] – Results on AI performance matching human reliability in contract drafting.

  • [21:00] – Debunking the myth about Human + AI always outperforming AI alone.

  • [23:00] – The finding that legal AI excels at surface material risks that lawyers miss.

  • [27:00] – A SWOT analysis of when to use humans and when to use AI.

  • [30:00] – Future roadmap for Legal Benchmarks AI research.

 

Up next
Nov 20
Beyond ChatGPT: Why In-House Counsel Need Purpose Built AI (Cecilia Ziniti, CEO - GC AI)
<p dir="ltr">This episode features a conversation with <a href= "https://www.linkedin.com/in/ceciliaziniti/">Cecilia Ziniti</a>, Co-Founder and CEO of <a href="https://gc.ai/">GC.AI</a>. Cecilia traces her career from early the early days of the internet to founding an AI-driven ... Show More
31m 22s
Nov 6
From 'No' to 'Go': How AI Guardrails Drive Trust, Enabling Legal to be a Business Accelerant, Not Blocker (Sabastian Niles, Salesforce President & CLO)
In this episode, Sabastian Niles, President and Chief Legal Officer at Salesforce, takes a deep dive into the intersection of corporate strategy, in-house legal careers, and the transformative power of Agentic AI. Sabastian shares his unique career path from a near two-decade ten ... Show More
24m 32s
Oct 9
Want to be a Crypto Lawyer? Rule # 1: Use the Technology. Rule #2: Beware of Hyper-Specialization (Justin Wales-Head of Legal, Crypto.com & Author of Crypto Legal Handbook)
Justin Wales, Head of Legal for the Americas at Crypto.com, and author of The Crypto Legal Handbook visits the show to provide his unique perspective on pivoting from a career in Constitutional Law, including work on high-profile appellate cases like the Obergefell gay marriage d ... Show More
38m 14s
Recommended Episodes
Sep 2024
Bitcoin, Blockchain and AI
<p class="MsoNormal"><strong style= "mso-bidi-font-weight: normal;">Chandra Duggirala</strong> discusses bitcoin, blockchain and AI, and how the confluence of these technologies can empower of financial self-sovereignty and super intelligence to every investor. Chandra is the CEO ... Show More
23m 24s
Oct 8
LIVE from RareEvo: TradFi vs DeFi Stablecoins (Lessons from Hacks, Policy, and Global Adoption)
Gerrit, developer relations at Curve Finance, discusses the unique risks and rewards associated with DeFi stablecoins, and how increased regulatory clarity has fueled the rise of TradFi stablecoins. He also spotlights the crucial role of platforms like Curve in providing essentia ... Show More
19m 14s
Oct 2018
Episode 187 – Fetch: The world’s first adaptive, self-organising ‘smart ledger’ using machine learning and AI
About The Guest: Toby Simpson CTO and Co-founder at Fetch.AI Producer of the successful a-life Creatures series of games and early developer at Deepmind. His thirty years’ experience in software, ten as a CTO, are now focussed on crypto-economics. Company Description: Fetch bring ... Show More
38m 58s
Jun 2025
🎬Han Jin - How Bluwhale Is Turning User Profiles Into Fuel For AI On The Blockchain
Join Brian Rose and Han Jin as they explore how Bluwhale is revolutionizing data ownership and AI innovation! 🔥 By leveraging blockchain technology, Bluwhale transforms user profiles into secure, tokenized assets, ensuring privacy, transparency, and control. 💡 Decentralized AI ... Show More
1m 34s
Aug 2024
#364: Ben Goertzel, CEO of SingularityNET, on The Relationship Between AI & Blockchain, The Future of AI, and AGI
<p>Dr. Ben Goertzel is a highly influential figure in the fields of artificial intelligence, robotics, and computational finance. Born in 1966, he has been a pioneering force in multiple scientific and technological domains. With a Ph.D. in Mathematics from Temple University, Goe ... Show More
45m 12s
Sep 2024
8 AI Business Ideas for Entrepreneurs to Pursue in 2024 ft. Siqi Chen
Episode 22: How can AI revolutionize business ideas in 2024? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) are joined by Siqi Chen (https://x.com/blader), CEO of Runway.com and a seasoned entrepreneur and investor in AI-related ventures. This epi ... Show More
38m 35s
Jun 2024
Are Coding Jobs at Risk? AI's Impact on the Future of Coding ft. Python Simplified | Mariya Sha
Episode 12: Are coding jobs at risk with the rise of AI? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) dive into this compelling topic with guest Mariya Sha (https://x.com/mariyasha888), a seasoned coder and the creator of the popular YouTube cha ... Show More
41m 37s
Dec 2023
Customizable Payment Infrastructure Built for the Future with John Mitchell of Episode Six
<p><!-- wp:paragraph --></p> <p><strong>Episode Topic:</strong></p> <p><!-- /wp:paragraph --> <!-- wp:paragraph --></p> <p>Welcome to an insightful episode of <a href= "http://soarpay.com/podcast/" target="_blank" rel= "noreferrer noopener">PayPod</a>, where we dive into the evol ... Show More
22m 11s
Jun 2024
The Cheat Code to AI Content with Roberto Nickson
Episode 9: How do you maintain trust and authenticity while exploring the world of AI content creation? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) are joined by AI enthusiast Roberto Nickson (https://x.com/rpnickson), a product designer and iO ... Show More
1h 3m