[Seminar] Physically Interactable 3D Indoor Scene Synthesis

preview_player
Показать описание
Speaker: SeungWon Seo
Рекомендации по теме
Комментарии
Автор

Thank you for the great presentation!
I found your paper really interesting, so I ended up with quite a few questions—hope that’s okay:

Q1. In Slide 9, you mentioned that the conditioned diffusion model takes time embedding and floor plan embedding as conditions. How are these embeddings obtained, and what forms do they take? I'm particularly curious about the floor plan embedding.

Q2. On Slide 9, you explained that the conditions are added to all layers. Does this mean they are added through simple addition calculations, as opposed to the attention mechanism used in stable diffusion, where conditions influence the attention computations for each layer's output?

Q3. Could you provide a more intuitive explanation of the modifications to the denoising process for constraint satisfaction mentioned on Slide 10?

Q4. The guidance function is constructed for three objectives, but I’m particularly concerned about the computational complexity of the reachability guidance. It seems to require numerous calculations and constraint functions. With such a large combination of functions, I wonder if the model can effectively learn and if the training time would be excessively long. Did the paper propose any specific training strategies or complementary techniques to address this? While the Author Insight section hints at this, honestly, it makes me wonder how it was trained at all!

정승재_teclados
Автор

Your presentation was very impressive, and it was interesting that it could create indoor scenes with physically plausible interactions.

I have two questions:
Q1. You said that the 3D-FRONT dataset is used for training and the GAPartNet dataset is used for inference. How is the difference between objects in the two datasets compensated for? Even if a similar object is selected, some position differences are expected to exist.

Q2. What exactly is the difference between Object Collision Rate and Scene Collision Ratio metrics?

Thank you!

tjswodud-cc
Автор

Thank you! It was a very insightful seminar 😊

I have two questions.

Q1. How did you address the mismatch issues when combining the static dataset (3D-FRONT) with the interactive dataset (PartNet)?

Q2. For articulated objects, does the model consider the full range of motion when generating the layout?

misong-kim