logo
episode-header-image
Jan 2025
37m 23s

Fraud Detection with Graphs

Kyle Polich
About this episode

In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications.

We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets.

This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.).

Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or having trouble with heterogeneous graphs, his method can tackle them because of the "locality assumption" – fraud will be a local phenomenon in the graph – and by relying on this assumption, we can get faster and more accurate results.

-------------------------------

Want to listen ad-free?  Try our Graphs Course?  Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

Up next
Oct 9
Sustainable Recommender Systems for Tourism
In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable to ... Show More
38m 2s
Sep 22
Interpretable Real Estate Recommendations
In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich interviews Dr. Kunal Mukherjee, a postdoctoral research associate at Virginia Tech, about the paper "Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations" The discussion explores ... Show More
32m 57s
Sep 8
Why Am I Seeing This?
In this episode of Data Skeptic, we explore the challenges of studying social media recommender systems when exposure data isn't accessible. Our guests Sabrina Guidotti, Gregor Donabauer, and Dimitri Ognibene introduce their innovative "recommender neutral user model" for inferri ... Show More
49m 36s
Recommended Episodes
Aug 18
High Performance And Low Overhead Graphs With KuzuDB
SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. ... Show More
1h 1m
Sep 2024
Data for Dummies: A Crash Course for Non-Technical PMs (with Mo Hallaba)
In today's data-driven landscape, organizations often find themselves drowning in a sea of data, yet struggling to glean actionable insights from it. Many companies are eager to label themselves as data-centric, but the reality is that not everyone is equally adept at interp ... Show More
23m 23s
Feb 2025
How Can GenAI Make Analytics More Accessible to Product Teams? (with Mario Ciabarra)
Whether you prefer the term data-driven, or data-informed, or data-dazzled, it doesn't matter—today's tech cannot survive without high quality data sets AND the tools to use them effectively. But we also can't afford to think about data as the responsibility of jus ... Show More
27m 46s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s
Apr 2015
Starting Simple and Machine Learning in Meds
In episode nine we talk with George Dahl, of  the University of Toronto, about his work on the Merck molecular activity challenge on kaggle and speech recognition. George recently successfully defended his thesis at the end of March 2015. (Congrats George!) We learn about how net ... Show More
38m 24s
Aug 26
From Academia to Industry: Bridging Data Engineering Challenges
SummaryIn this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and ... Show More
50m 54s
Jul 2022
IoT, IIoT and Managing Edge Data
Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.SHOW: 634CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST ... Show More
35m 37s
Dec 2024
Watching the watchers. IoT vulnerabilities exposed by AI. [Research Saturday]
This week, we are joined by Andrew Morris, Founder and CTO of GreyNoise, to discuss their work on "GreyNoise Intelligence Discovers Zero-Day Vulnerabilities in Live Streaming Cameras with the Help of AI." GreyNoise discovered two critical zero-day vulnerabilities in IoT-connected ... Show More
21m 15s
Jan 2024
Designing Data Platforms For Fintech Companies
Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platfor ... Show More
47m 57s
Nov 2017
Tackling deforestation: use existing data better
Emerging technology can help companies monitor their supply chains and track deforestation commitments better. But there is now so much data available, how can companies make sense of it all? In this podcast, Niels Wielaard, managing director of remote sensing tech firm Satellige ... Show More
7m 17s