BigBird Research Ep. 4 - Where Does BigBird Help?

preview_player
Показать описание

Weekly Research Group, April 29th, 2021

So far, I’ve struggled to get BigBird to outperform the original BERT (using the simple strategy of truncating text for BERT). This week, the group helped me figure out how we might craft a code example that best demonstrates where BigBird might be most applicable or useful.

In the process, we touched on:
- The authors’ recommendation to only use Sparse Attention above 1,024 tokens.
- Why BigBird is valuable for Question Answering.
- Possible strategies for addressing GPU memory concerns with BigBird.

Outside of BigBird, we also talked about how to use a classifier to help label a large unlabeled dataset, and strategies for detecting the author of a piece of text.

I’ll be implementing the group’s suggestions this week, and for the next group we’ll see how it went!
Рекомендации по теме
Комментарии
Автор

I love what you guys are doing. It will be a great to compare BigBird Vs. Chunking. I am working on a dataset that has most data length greater than 512 and I will really need to know how they both perform on my dataset.

ifeanyindukwe
Автор

Hey Chris, really loved the video. Could you please share me some resources on learning distributed training in Pytorch as someone getting started in distributed training it’s really intimidating. Perhaps you could also make a YouTube video explaining all the details

stephennfernandes