Все публикации

Were RNNs All We Needed? (Paper Explained)

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)

Scalable MatMul-free Language Modeling (Paper Explained)

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)

xLSTM: Extended Long Short-Term Memory

[ML News] OpenAI is in hot waters (GPT-4o, Ilya Leaving, Scarlett Johansson legal action)

ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)

[ML News] Chips, Robots, and Models

TransformerFAM: Feedback attention is working memory

[ML News] Devin exposed | NeurIPS track for high school students

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

[ML News] Llama 3 changes the game

Hugging Face got hacked

[ML News] Microsoft to spend 100 BILLION DOLLARS on supercomputer (& more industry news)

[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)

Flow Matching for Generative Modeling (Paper Explained)

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)

[ML News] Grok-1 open-sourced | Nvidia GTC | OpenAI leaks model names | AI Act

[ML News] Devin AI Software Engineer | GPT-4.5-Turbo LEAKED | US Gov't Report: Total Extinction

[ML News] Elon sues OpenAI | Mistral Large | More Gemini Drama

On Claude 3

No, Anthropic's Claude 3 is NOT sentient

[ML News] Groq, Gemma, Sora, Gemini, and Air Canada's chatbot troubles