'Not with my data' many people say to AI companies

preview_player
Показать описание

Artificial intelligence just seems to keep growing and growing and growing, fueled by largely unregulated access to massive amounts of free, publicly available data. But unfortunately for our future robot overlords, that data has begun drying up as websites and organizations have begun restricting access to their information. What does this mean for the AI industry? Let’s take a look.

🔗 Join this channel to get access to perks ➜

#science #sciencenews #ai #tech
Рекомендации по теме
Комментарии
Автор

Having to know the name of the bot to block it feels eerily like an exorcism

somerandompersonintheinternet
Автор

AI: "We'll conquer humanity... right after this cat video."

DataIsBeautifulOfficial
Автор

The fun story with Facebook, they were asked, in Australia, if they were using accounts to train AI, and the answer was "we'll have to look into it". Then it turns out it is completely legal in Australia to use the data and they come out "Yeah. Yeah we're using any and all photos shared by Australians in our AI." No pretense. No concern.

doublepinger
Автор

If you're going to talk about copyright hypocrisy, make sure to mention the Internet Archive and how it's being sued into oblivion by people who think libraries are a disservice to humanity.

michaelleue
Автор

"We need high quality data. Let's crawl the internet for that!"

Something does not fit together...

mangalores-x_x
Автор

Sorry, some correction regarding robots.txt:
1) You can make a rule to block all bots and then allow only certain (like Google bot) via allow list.
2) robots.txt is still only as useful as a stop sign put on an empty field of grass. Bots from Perplexity etc just ignore it.
Hence all at least kinda reliable blocking requires technical measures... usually using more AI, just not of the generative kind. Thus one can see it is data theft, as much as a someone breaking a window is doing physical theft.

stephan
Автор

Happy birthday 🎈❤, Dr. Sabine (well, tomorrow), so good that you were born, great to have you here in our universe.

Thomas-gk
Автор

Wow, so the movie Short Circuit (1986) was prescient. AI "needs input."
Number 5: "Malfunction. Need input."
Stephanie Speck: "Input. That's information! Listen, I am full of it."

kimwelch
Автор

Definitely talk about copyright hipocricy !

drazenimoti
Автор

This is something I laughed about when this whole thing started - GIGO - Garbage In, Garbage Out.

RocketCityTim
Автор

robots.txt doesn't block access, it just tells the crawlers, which files they should not access. So, there is no need to rename the crawlers, they can just ignore these instructions.

thomasmueller
Автор

Technically, robots.txt does not prevent crawler activity but merely advises on it. Some crawlers respect this advice, while many ignore it and often do not properly identify themselves as bots.

dmitrysmirnov
Автор

robot.txt is just a sign in your front yard that says "don't take my data, please." It has no effect on crawlers that ignore it. And web crawlers are literally called robot spiders.

scottmiller
Автор

I wouldn't be surprised if Open AI just bought the data from companies with less restricted web crawlers.

michaelblacktree
Автор

Most of VPN companies will actually sell your data. You can only guarantee privacy by running your own VPN server, or using a VPN that doesn't require your ID and pay with crypto or cash.

kras_mazov
Автор

About robots.txt: you can say nobody except a few are allowed to crawl. Also the crawlers can just ignore robots.txt. Don't know if there is any legal ruling about if robots.txt is legally binding.

bloody_albatross
Автор

6'13" exactly. I've seen job ads that recruit scientists to train models for $50 per hour, which is slightly higher than what a science professor gets on average in the US.

suichiao
Автор

I'm a software developer and, at work, we're putting a lot of work implementing AI to answer phone calls, route calls, etc... you know, customer service stuff. Then, the other day I was looking up a restaurant and Google offered to have AI call for me, check the wait times and/or make a reservation... that's when I realized... we're building a network of AI bots that all talk to each other using English as their API? I don't know what that means but... it doesn't sound good at all.

charliemopps
Автор

An additional problem is that more and more "data" on the internet is itself generated by AI, or at least LLMs. So AI will increasingly be training on so-called data that it produced itself. This will lead to total uniformity (we're already about 80-90% there already) and a convergence on idiocracy.

chrishall
Автор

Really interesting and well put together

ericlani
join shbcf.ru