با برنامه Player FM !
The Dos and Don’ts of Synthetic Data with Minhaaj Rehman
Manage episode 374320790 series 2951995
Ever heard of ‘synthetic data’?
Synthetic data is data that is artificially created (from statistical models), rather than generated by actual events. It contains all the characteristics of production data, minus the sensitive stuff.
By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated, according to Gartner.
The reason organisations may use synthetic data over actual data is because you can get it more quickly, easily and cheaply.
But there are concerns with this approach, because synthetic data is based on models and algorithms designed by humans and their biases.
More data doesn’t necessarily equal better data.
Is synthetic data a brilliant tool for improving data quality, reducing data acquisition costs, managing privacy and reducing overfitting?
Or does synthetic data put us on a slippery slope of hard-to-interrogate models that are technically replacing fact with fiction?
To answer these questions, I recently spoke to Minhaaj Rehman, who is CEO & Chief Data Scientist at Psyda, an AI-enabled academic and industrial research agency.
In this episode of Leaders of Analytics, you will learn:
- What synthetic data is and how it is generated
- The most common uses for synthetic data
- The arguments for and against using synthetic data
- When synthetic data is most helpful and when it is most risky
- How to implement best practices for mitigating the risks associated with synthetic data, and much more.
Episode timestamps:
00:00 Intro
03:00 What Psyda Does
04:23 Academic Work and Modern Education
06:38 Getting into Data Science
11:30 What is Synthetic Data
13:30 Common Applications for Synthetic Data
18:50 Pros & Cons of using Synthetic Data
21:29 Risks of using Synthetic Data
23:48 When should Synthetic Data be Used
29:23 Synthetic Data is Cleaner than Real Data
34:05 Using Synthetic Data for Risk Mitigation
36:05 Resources on Learning More about Synthetic Data
38:05 Human Biases in Decision Making
Connect with Minhaaj:
Minhaaj on LinkedIn: https://www.linkedin.com/in/minhaaj/
Minhaaj's website and podcast: https://minhaaj.com/
61 قسمت
Manage episode 374320790 series 2951995
Ever heard of ‘synthetic data’?
Synthetic data is data that is artificially created (from statistical models), rather than generated by actual events. It contains all the characteristics of production data, minus the sensitive stuff.
By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated, according to Gartner.
The reason organisations may use synthetic data over actual data is because you can get it more quickly, easily and cheaply.
But there are concerns with this approach, because synthetic data is based on models and algorithms designed by humans and their biases.
More data doesn’t necessarily equal better data.
Is synthetic data a brilliant tool for improving data quality, reducing data acquisition costs, managing privacy and reducing overfitting?
Or does synthetic data put us on a slippery slope of hard-to-interrogate models that are technically replacing fact with fiction?
To answer these questions, I recently spoke to Minhaaj Rehman, who is CEO & Chief Data Scientist at Psyda, an AI-enabled academic and industrial research agency.
In this episode of Leaders of Analytics, you will learn:
- What synthetic data is and how it is generated
- The most common uses for synthetic data
- The arguments for and against using synthetic data
- When synthetic data is most helpful and when it is most risky
- How to implement best practices for mitigating the risks associated with synthetic data, and much more.
Episode timestamps:
00:00 Intro
03:00 What Psyda Does
04:23 Academic Work and Modern Education
06:38 Getting into Data Science
11:30 What is Synthetic Data
13:30 Common Applications for Synthetic Data
18:50 Pros & Cons of using Synthetic Data
21:29 Risks of using Synthetic Data
23:48 When should Synthetic Data be Used
29:23 Synthetic Data is Cleaner than Real Data
34:05 Using Synthetic Data for Risk Mitigation
36:05 Resources on Learning More about Synthetic Data
38:05 Human Biases in Decision Making
Connect with Minhaaj:
Minhaaj on LinkedIn: https://www.linkedin.com/in/minhaaj/
Minhaaj's website and podcast: https://minhaaj.com/
61 قسمت
すべてのエピソード
×به Player FM خوش آمدید!
Player FM در سراسر وب را برای یافتن پادکست های با کیفیت اسکن می کند تا همین الان لذت ببرید. این بهترین برنامه ی پادکست است که در اندروید، آیفون و وب کار می کند. ثبت نام کنید تا اشتراک های شما در بین دستگاه های مختلف همگام سازی شود.