CppCon 2018: Stoyan Nikolov “OOP Is Dead, Long Live Data-oriented Design”

preview_player
Показать описание


For decades C++ developers have built software around OOP concepts that ultimately failed us - we didn’t see the promises of code reuse, maintenance or simplicity fulfilled, and performance suffers significantly. Data-oriented design can be a better paradigm in fields where C++ is most important - game development, high-performance computing, and real-time systems.

The talk will briefly introduce data-oriented design and focus on practical real-world examples of applying DoD where previously OOP constructs were widely employed.

Examples will be shown from modern web browsers. They are overwhelmingly written in C++ with OOP - that’s why most of them are slow memory hogs. In the talk I’ll draw parallels between the design of systems in Chrome and their counterparts in the HTML renderer Hummingbird. As we’ll see, Hummingbird is multiple times faster because it ditches OOP for good in all performance-critical areas.

We will see how real-world C++ OOP systems can be re-designed in a C++ data-oriented way for better performance, scalability, maintainability and testability.

Stoyan Nikolov, Coherent Labs AD
Chief Software Architect

Stoyan Nikolov is the Chief Software Architect and Co-Founder of Coherent Labs. He designed the architecture of all products of the company. Stoyan has more than 10 years experience in games. Currently he heads the development of Hummingbird - the fastest HTML rendering engine in the industry and of LensVR, the first VR-centric web browser. Previously he worked on multiple graphics & core engine systems and on large-scale ERP solutions. Stoyan has degrees in Applied Mathematics and Computer Graphics. He is interested in high-performance computing, graphics, multithreading, VR and browser development.

Coherent Labs AD

Coherent Labs is a leading game middleware company that develops cross-platform game user interface products. It aims to solve complex problems for major gaming companies such as Arena Net, NCSoft, Bluehole, and hundreds of others, and to help them create stunning and high-performance UI. Using its experience in web, game technologies, and user interface, the company is developing a Virtual Reality browser.


*-----*
*-----*
Рекомендации по теме
Комментарии
Автор

References for easy googling:
"Data-Oriented Design and C++", Mike Acton, CppCon 2014
"Pitfalls of Object Oriented Programming", Tony Albrecht
"Introduction to Data-Oriented Design", Daniel Collin
"Data-Oriented Design", Richard Fabian
"Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)", Noel Llopis
"OOP != classes, but may == DOD", Lane Roathe
"Data Oriented Design Resources", Daniele Bartolini

Condog
Автор

Whoever lined the addbreak up for the end of the talk before the questions, well done and thank you. It was wonderful to get through this talk uninterupted.

onerimeuse
Автор

A sad thing is web people think 13 fps from rendering a few quads is completely normal

nexovec
Автор

Great to see another data oriented talk at CppCon

SasLuca
Автор

Lots of OOP fans are being triggered here, lol. Basically, it boils down to memory access patterns. If you can arrange your computations to simply "flow", best without hiccups, i.e. without branching and chasing pointers all over the place, and if you can put data being accessed by hot code in the same place, so they can be prefetched in the cache, you win big, performance-wise. That is what DOD is all about: to better match program data to the way hardware operates. Now, in cold code, OOP is just fine: it dramatically increases programmer efficiency at the cost of runtime efficiency, a cost we're willing to pay. But in hot code, HPC code, and real-time code, DOD is vastly superior, as it much better matches the problem to the hardware.

fhajji
Автор

The more I research about Data-Oriented Design, the more I believe Object-Oriented projects have just been saved by great hardware. It's just a luxurious way of programming that's completely detached from the actual behavior of the machine.
Your program might be object-oriented but your machine isn't.

taia-bf
Автор

Stoyan: "Study Chromium, it's made by the best engineers in the world, there's a lot to learn!"
Also Stoyan: Shows how the best engineers in the world designed an overengineered system with poor performance.

obiwanus
Автор

The speaker could have given a much better answer to that second question, which asked about what exactly he thinks should be "dead" in OOP.

OOP was designed to help with program design, maintainability and reusability. Things like encapsulation and abstraction are key core concepts of OOP, and they were developed in order to aid in the design of huge programs. It's a tool to create a million-lines-of-code program in such a manner that it remains manageable, understandable and maintainable. When properly used, it makes code simpler, safer and easier to understand and to develop further. It also helps in making code reusable, meaning that it's relatively easy to take some functionality that could be used in another part of the same program, or even a completely different program, and use it there without much or any modification. This helps reducing code repetition and overall work. It also helps in stability and reducing the number of bugs, because when a piece of code has been extensively tested, you can be certain it won't be a problem when used in another place or program. OOP does a relatively good job at this.

The problem with OOP is that it wasn't designed, at all, with low-level efficiency in mind. Heck, back when OOP was first developed as a paradigm computers didn't even have any caches, pipelines or anything like that. There were no "cache misses" because there were no caches. Memory access speed was pretty much constant no matter how you read or wrote there. The performance penalty from conditionals and branching wasn't such a big concern back then either. It was but decades later that processor architectures went into a direction where the scattering of all the data randomly in memory became an efficiency problem.

Thus, if we want maximum efficiency in terms of cache usage, what needs to "die" in OOP is the concept and instinct of decentralizing the instantiation of the data that has a given role. The data needs to be taken out of classes, and centralized into arrays, and thus we need to break encapsulation and modularity in this manner. We also need to minimize branching, which in terms of OOP means minimizing the use of dynamic binding (in addition to try to minimize conditionals).

DjVortex-w
Автор

fantastic talk, the questions were very provocative, the speaker handled them excellent

duminicad
Автор

A breath of fresh air. Every time I try to read the source of an open source project written in C++, I find that, for even the simplest thing, I end up having to hunt down and read a few dozen different member functions in quite a few distinct classes (in distinct files). Object-orientitis I call it. Though I find the arguments here are more against excessive abstraction and splitting things into an excessive number of objects pointed to.

Chalisque
Автор

"[OOP and DOD] are just tools in your toolbox" — this is good advice! Unfortunately, many developers treat OOP (and TDD, and...) almost like a religion; not as a tool, but as a rigid set of beliefs that must be adhered to at all times, lest you evoke the anger of The Prophets; Fowler and Uncle Bob, hallowed be their names. But OOP is just a tool. Go explore, have fun, learn new things, and expand your toolbox. You'll see that hammer-oriented carpentry is limited :-)

sqaxomonophonen
Автор

I believe GoF's Flyweight is exactly what this talk is about. The part of the problem with OOP lies in how we approach OO design and how we teach it. Beeing a teacher myself, I always encounter students who have this Animal and Dog and Cat style of OO design, so starting with very beginning we have this very naive mindset how to model the world in the software. Like Animation class in Chromium.

My take is to see OO more like system/API level thing, more like modules in Oberon or Service in Spring. Here is where OOP is really shines

jatvarthur
Автор

I find it funny that for every one of these talks, there is always someone trying to make a jab along the lines of "if you use data-oriented design, then after ten years of data bloat you have to do a lot of hard work to keep the code running as fast", as if that is a downside of data-oriented design. Well, duh. Of course it's hard work. The same bloat happens in object-oriented code as well, but there it's too difficult to see past all the classes and indirection, so most people just give up and accept the slowdown as "inevitable".

SLiV
Автор

Better hide that PUBG picture if you want to talk about high performance...

Gargantupimp
Автор

That's nice feeling when I notice, I reinvented DOD just after watching some videos about caches.

panakap
Автор

I'm now trying to use Data Oriented Design in a business apps where different aspects are moved into the components (in the sense of Game Entity Component Systems). For the reason that it is much easier to synchronize with remote systems and avoid fatal sync failures, lets see how it goes. While it is a nice talk i have not seen Data Oriented Programming outside the Games Industry.

But on the other hand. A normalized inmemory relational database system is doing exactly what an ECS is supposed to do.

llothar
Автор

I'd like to see the code and cache hit data from the guy who asked the question at 51:05. I'm 99% sure his code is a prime example of what Nikolov is talking about.

nextlifeonearth
Автор

As always, talks like this drew lots of heat from OOP fans in the Q&A. Maybe the title should have been less inflammatory :P.

dementedchicken
Автор

In HPC data oriented design is the norm, it comes from the way people used to write programs in FORTRAN where not even structs were available. Scientific codes with a long history have high likelihood of being written by people who care about performance and know about the hardware.

andreanobile
Автор

I see it very right that author mentioned that OOP is much more applicable under some circumstances than DoD. And I would like to expand this point with a bit of personal thoughts. I'm sorry that I've done a poor job to structure them well. To my mind, It's too early to kill OOP as some comments below propose. Personally, I see the best suite for OOP in a classical enterprise bussiness-oriented programms where dynamic polymorphism is not just that OOP thing we use only because of its existence but which plays a crucial role in building highly maintainable architectures. For example, I mean the Dependency Inversion Principle (D in SOLID) that allows the flow of control and the flow of dependency to run in the opposite directions.

I can't clearly see an application of DoD on business architectures. DoD does demand you that you know your domain very well in advance which is not usually possible. In classical OOP you can separate your business-logic that seldom changes into a separate component and provide it with a plenty of interfaces to decouple it from the details. You can experiment with the details as much as you want leaving the core of your application logic unchanged. And moreover, that inversion of dependency allows you to get rid of even transitive dependecies on the details which can end up in ability to compile and deploy your business logic separately. In DoD you concentrate on the data more than on the behaviour which is not always the right way of designing some systems.

Another significant downside of DoD I see is the lack of context in your data structures. Encapsulation does a good thing in a matter that you give others programmers not only the info about the data you but also the hint into how your data is usually used. Moreover, in classic OOP your are allowed to add some restrictions on the internal data of the object usage. Imagine directly accessing and modifying the std::vector's raw data pointers. Of course, obsessive encapsulation can lead to bloating your objects with a bunch of methods whose logic belongs to different parts of the system. But that is an obvious violation of Single Responsibility and Interface Segregation Principles. In that case, weakening the restrictions of data accessing of the object and moving the odd logic to the corresponding subsystem would be applicable.

That leads to my final point. Residing in the middle of those design paradigms usually is the best practice. I'd support my point with a personal example. Recently I had an opportunity to apply DoD on a game I'm currently working on as a pet-project to organize my gameobjects. However, I really liked the ECS pattern, I didn't want to restrict myself with putting the logic only in the systems. That's why I added all the necessary virtual methods to components. That's allowed me to use an Entity-Component and Entity-Component-System patterns together. And now some components that better know what they need and how to act, like Player's, Enemy's components, have all the logic packed with them while other components that are a part of some more complex systems, like Physics' colliders and rigidBodies, just hold the data. To my mind, that's taking the best from two worlds which gives me a plenty of flexibility without the restrictions of the particular design philosophy. Even though such components technically don't differ as they both have virtual functions that are called every tick, I can separate them into different classes that can be stored in different arrays and, moreover, introduce some custom memory allocators to imrove data locality and reduce cache misses. As you can see, there is a plenty of optimizations that can be added on demand.

SmnTin