GPT-4o, AI overviews and our multimodal future

Показать описание

In Episode 3 of Mixture of Experts, host Tim Hwang is joined by Shobhit Varshney, Chris Hay, and Bryan Casey for the OpenAI vs. Google showdown. Shobhit analyzes the showcase demos released by OpenAI and Google. Chris breaksdown latency and cost in relation to GPT-4 and Gemini 1.5 Flash. Finally, after years of people proclaiming the death of search, Bryan answers the big question: are LLMs forcing the death of Google search?

The Rise of Multimodality: 3:13
Collapsing Latency & Cost: 16:54
LLMs Eat Google Search: 30:12

The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity.

Рекомендации по теме

Комментарии

I took a photo of this podcast in my Youtube browser, and asked GPT-4o to analyze it. It took only 5 seconds and here is the analysis:

The image shows a virtual panel discussion hosted by IBM, as indicated by the "think | IBM" branding at the top left corner. There are four participants visible, each in their own video feed, suggesting a remote conferencing setup.

The title of the discussion, visible at the bottom, is "AI models and our multimodal future." This indicates the conversation likely centers around advancements in artificial intelligence, particularly focusing on multimodal AI, which involves systems that can process and understand multiple types of data simultaneously (e.g., text, images, audio).

Other details such as the video being liked 22 times, options to share, download, and access more content, and the YouTube browser tabs at the top, suggest this is a live-streamed or recorded event available for viewing online.

ThoughtfulAl

Beautiful show first time here. Interesting and informative guests. You have a very comprehensive discussion panel going

RomaineGangaram

8:46 My take on this is the extended context window is how it is rooted in enterprise. These huge context windows, 1-2m tokens, is enough to provide the context required. Bumps up the inference cost, sure, but it's how you give the context without relying on RAG

MartinBroadhurst

Let's be super clear about Google advertising. The way to think about paid online advertising is a comparison to what organized crime used to call protection. So in my best Tony Soprano voice I'll now sell you some online advertising. "Hey, I hear you want to sell some service X",
"You know the guy down the street sells service X, and he's doing alright",

"You know he donates to the.... well... let's say he pays to play",
"Are you going to pay to play?"
"You know you've got a nice place here, I'd hate to see it burn down..."

That's essentially it. You'll hear talk about the success of google because of amazing search results. This is incorrect. When Google launched most of us were on 56kps dial up connections. The search competition had pages full of crud that took a lifetime to load. Google didn't have any crud so it loaded quickly. That was it. Now Google just serves up a load of trash while it makes 100s of millions of dollars serving up that crud.

Superteastain

I think 'o' actually stands for orbitofrontal cortex.

It could be a smaller/narrow receptive input network that doesn't really retain or memorize beyond simple and critical pathways, and a much larger network that assesses the weighted inputs - for bottom-up top-down approach. Because of this, I think 4o is a double ended model that are working together for distilling input and assessment.

This region of the brain is multimodal, but just as our organic builds, vision is the primary input where the other modalities also largely construct to visual representations (hear a garbage truck outside, visualize what that truck looks like in your head). This region is also extremely low latency by necessity as responses to visual input needs near-automatic responses (driving a car, walking).

All things considered I think this is the analogue of our orbitofrontal cortex and perhaps the applicability extends far farther and wider than theorized prior to implementing the solution. Shy of having the equivalent biological need to survive, I think this is AGI and we've only seen the baby brother. I don't think we'll get the whole enchilada this year or the next, rather what they've been saying, an agentic version of Jr to do biddings to paid subscribers. This kind of a breakthrough aligns with the primary scientists (and alignment conscientious) taking their leave as the management has turned on the primary objective, allocating infrastructure resources to press forward with the model's expansion over creating safety for it.

Charles-Darwin

Given the latency differences between quick/shallow answer vs. longer pause/logically deeper answer, I hope systems will have a built-in default option to pick the depth of answer you want. I'm imagining a few keywords to trigger a deeper-thinking mode with the implicit assumption that you are willing to tolerate a longer pause (and/or a possible charge to your account?) - just as you would expect with a human. I assume that if I say something like "take some time to think about this and give me the best response you can" then the AI will, indeed, take some time to give the most deeply intelligent response. Am I thinking about this plausibly? Or am I being naive?

gaylenwoof

Modes of media, fragmented ...here are there

What is more useful is smartphones' functions like doing banking and ordering products...this is what is called "multifaceted integration "

DrJanpha

Parts geolocation has been around for ten years, with maint scheduling etc

lighteningrod

🎉❤😊When law enforcement and security footage are translated from image to text...the jury and judge will be more confident in their determination of reasonable doubt.

The accuracy of information will be more reliable and hopefully justice will be served.🎉❤😅

dianedean

Environmental metric should be first.. unless you like to sun bath at 80 degrees Celsius some day on an average summer day...

flashmo

Unfortunately the Google demo was pre-generated and they provided a tiny pathetic little disclaimer in the bottom right corner of the screen.

Besides, if it goes like the last presentation, everything AI google announce either doesn't work at all or never arrive or if it arrives it comes 6 months later in a scrapped form... So this podcast putting the OpenAI and Google presentations on the same plane is borderline bizarre and probably unprofessional too

markmuller

google search, for me, has been garbage. Northern light was used to be way better for search, but not for ad sale

bamhre

from Generative to Objective-Driven. 🙃

AlgoNudger

Imagine that incredibly annoying, ingratiating female 4.o voice dealing with a police officer if you were pulled over- or any authority / face-to-face response scenario....would the authority figure punch you first or simply tear your iphone in two?

Hordebarraged

GPT-4o, AI overviews and our multimodal future

GPT-4o, AI overviews and our multimodal future

Dog meets GPT-4o

Sarcasm with GPT-4o

Live demo of GPT-4o vision capabilities

Interview Prep with GPT-4o

Live demo of GPT-4o realtime translation

Say hello to GPT-4o

Introducing GPT-4o

Master Project Overview: Build a Research Agent with LangGraph, GPT-4o, RAG, ArXiv, SerpAPI

Live demo of GPT-4o coding assistant and desktop app

GPT-4o WILL CHANGE EVERYTHING!! 🤯🤯🤯

26 Incredible Use Cases for the New GPT-4o

Be My Eyes Accessibility with GPT-4o

Math problems with GPT-4o

Two GPT-4os interacting and singing

Realtime Translation with GPT-4o

Point and Learn Spanish with GPT-4o

Sam Altman talks GPT-4o and Predicts the Future of AI

Meeting AI with GPT-4o

The GPT-4o Viral Social Media Content System

GPT-4o - Full Breakdown + Bonus Details

Free GPT-4o vs. ChatGPT Plus: What's The Difference?

How to have meeting minutes generated automatically with Chat GPT

What GPT-4 Can Really Do