What is Synthetic Data? No, It's Not 'Fake' Data

Показать описание

Synthetic data is artificially generated data versus data based on actual events, but it's not "fake" data. It replicates the properties of real data without the troubles of capturing it, such as confidentiality, low-volume, or expensive-to-validate. With synthetic data, it's easier and less costly to train AI models, however, it's not a panacea. For example, synthetic data may not fully represent the unexpected events that happen in the real world. In this video, Martin Keen explains what synthetic data is, its uses, benefits, and challenges; he wraps up his presentation by explain how it's generated.

#datascience #businesssolutions #lightboard #ibm #computerscience #data #machinelearning

Рекомендации по теме

Комментарии

I am amazed how this dude can write backwards so perfectly

danielmaciel

Amazing series and very classical and engrossing style of explanation... keep up the good work

tmastana

What is very interesting about this concept is the validity and reliability of them. Why they don't talk about it! it's essential when we talk about mathematical set's of any data!

amazingwarrior

Really love these IBM mini lectures, they are very insightful. Helped me during my college days, and are also helpful for learning as a hobby. Thanks!

bejxtyn

Can synthetic data be as effective as real data? Wouldn’t model getting trained with synthetic data be giving false results when used against real data?

anandkalhore

Yes, cool stuff. We use synthetic data for tracking trucks in the field. By taking existing labeled data and transforming the truck in three dimensions to get the additional data for the model.

rickharold

I find it difficult to stop thinking about Martin Keen, and his prediction about Southampton's future in the Premier League. It's quite remarkable that both Southampton and Leicester will be battling it out in the Championship to regain their positions in the top tier in 2025. A great example of the problems with synthetic data.

lozanojavier

You are a very good teacher. Do you have a full course on this?

talalrahim

I think this video might have jinxed Southampton. Instead of winning the Premier league they are now getting relegated.😢

HoustonKhanyile

Great series from IBM in general and this instructor specifically . Slightly hopeful on the Southampton bit but if you can't dream, what's the point of it all😃

mthoko

Takeaway:
Made up data can be used to deal with biased real word data and can be obtained from data sources or transforming existing data by adding noise or using GANs.

karengomez

Synthetic data has been very useful in my field (gene regulatory networks; maps of interactions that affect gene expression within cells). We can't manually test the interactions of tens of thousands of genes, especially across tens/hundreds of thousands of species, so we predict them using large molecular datasets.

The problem is, how can you evaluate the accuracy of a prediction algorithm if you don't know what's true or false? Synthetic data is super useful, since you can generate data with known interactions that you can compare to. Algorithms can then be ranked on how close their predictions match the synthetic dataset. A great example is the GNW DREAM Network Inference Challenge, if you want to see how they use this!

seanrrr

What kind of transparent white board is he using to write on? Very cool. Have not quite seen this before.

quantumpotential

Can we add regional human corruption to make synthetic data more reliable one also and should it be under noise?

KNOT-zdwh

Why is it not called a fake message that is not clear in the video..

nagkumar

How is this not basing later models on copies of copies of potentially incorrect data? Won't we end up with piles of structurally sound, true seeming noise eventually?

almor

Interesting, if rather simplistic. Having spent the past 5/6 years developing a synthetic police-data model, it is not easy or cheap (if time is factored in). Rows and rows of financial transactions might be easy to generate, less so, complex family groups, locations, incidents and crimes, vehicles, organisations, where these are interlinked, related and reflect real-world scenarios. Whilst IBM has some excellent tools such as i2 and Watson, the real data in those systems would be unlikely to be made available for sythesising.

ianoldfield

using the prem was the perfect hook icl

nicoles_handle

nice, now I can generate data for my HIV viral load detector model at no cost

watipasokamanga

Thanks for the video.

May I ask... is this British accent?

itdataandprocessanalysis

What is Synthetic Data? No, It's Not 'Fake' Data

What is Synthetic Data? No, It's Not 'Fake' Data

What is Synthetic Data?

What is Synthetic Data? : Simply Explained

What is Synthetic Data? And why is AI-generated Synthetic Data superior? (Part 3/5)

5 ways to generate synthetic data | Synthetic data generation machine learning | Synthetic data

Synthetic Data

Synthetic data tutorial: What is synthetic data?

Is Synthetic Data The Future of AI? (And How To Make Your Own)

CompTIA AI SysOp+ Foundations Master AI System Operations – Full Course No ConfigLAB 3hr

Put the science back into data - Use synthetic data!

What is synthetic data? An introduction to privacy-enhancing synthetic data

Synthetic data generation with CTGAN

What is Synthetic Data?

Why do we need synthetic data?

Synthetic data explained in the context of major companies #aipodcast #ai #podcast

How is Synthetic Data Generated?

Generate Synthetic Data in 60 seconds | Gretel.ai

Synthetic data generation for AI training — Blender Conference 2024

What is synthetic data? #dataprivacy #datascience #data #syntheticdata

The Secret To AGI - Synthetic Data

Is Synthetic Data the Future of Machine Learning? 🤖📊

The Game Changing Evolution of Synthetic Data Generation: Magpie

AI Model Collapse The Risks of Recursive Training and Synthetic Data in Generative AI

NVIDIA's MASSIVE Model Creates Synthetic Data, But Is It Actually Good?