What Is A Monorepo And Why You Should Care - Monorepo vs. Polyrepo

preview_player
Показать описание
What is a monorepo and why you should care? Why would you use mono-repo in the first place? What are the differences between mono repo and polyrepo or multirepo? What are the pros and cons of monorepo? Should you switch to monorepo or keep using poly-repos or multi-repos?

#monorepo #polyrepo #multirepo

▬▬▬▬▬▬ 🚀 Courses, books, and podcasts 🚀 ▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬

▬▬▬▬▬▬ Timecodes ⏱ ▬▬▬▬▬▬
00:00 What is monorepo?
00:32 Who is using monorepos?
01:06 Advantages
02:01 Easier collaboration
03:50 Simplified dependency management
05:04 Easier refactoring
07:25 Disadvantages
12:46 Should you switch to monorepo?
Рекомендации по теме
Комментарии
Автор

Are you using a monorepo, polyrepos, or both?

DevOpsToolkit
Автор

This video might be 2 years old but it's still very relevant, and I found it very helpful. It expresses all of the concerns I have around mono vs multi-repo, but I couldn't quite put into words as you have so well elucidated. Thanks so much for making this.

kandredfpv
Автор

I think you're spot on here, I'd just call out that not everyone uses the Microservices architecture, so I think it's key to mention that the repo strategy you choose should take your chosen architecture into consideration. Regardless of architecture, a good separation of concerns/related functionality should be universal, regardless of whether that's at the repo level or not.

scottaltham
Автор

Many of the problems you describe happen when working on loads of applications in the same project. I'm using monorepos for a single app with its backend, frontend etc and I've found it really helpful. Then again we are a very small team.

Another problem I've heard is that CI build needs to build everything as it cannot detect what projects changed. Though this could be good for spotting regressions relating to build quickly.... maybe?

As you say all of these issues can be mitigated but it's important to consider them. Thanks for the video, very informative.

zebcode
Автор

We have a monorepo at work from before I joined. Not all of company's project are in it but a bunch of related applications are. The dream of single PR for changes in multiple apps is just a dream. Even in our monorepo, I see two PRs for the same feature change - one by the frontend dev and on by the backend dev. I am not bothered by it, in fact I prefer it that way as I review backend code with higher rigor. Other than frontend-backend code, generally projects should be very loosely coupled and in that case a single PR makes no sense. So the advertised benefit of single PR across projects will be useful only when you are doing things wrong i.e. when you have tightly coupled code.

The commit noise is just too much in a monorepo. Like you said, it is difficult (or not possible) to assign commit or PR rights to individual projects inside a monorepo. So the first thing I have to check in every PR is whether the dev is making modifications in projects he/she should not be touching in context of that PR. This problem would never occur in a polyrepo.

When you hear someone praising monorepo, ask what kind of apps they are building. Monorepos supporters are usually working with monoliths or distributed monoliths where all apps are updated in lock-step with each other. Such projects have no need to solve for situations where each app/microservice can be deployed at different times,

In my case, we have several microservices in a monorepo. Monorepo are a bad fit for Microservices. I am managing it by creating tags for each version of a specific app.

lhxperimental
Автор

Very interesting video, as usual :). I've had a similar argument with a relatively unexperienced team where developers were in favor of monorepo. My main concern was that it will encourage bad practices and thigh coupling in microservices architecture (which at the end it did!). Once very interesting aspect you've mentioned is API and versioning. A typical (micro)service development process has business logic and externally facing APIs in the same code base deployed together. Understanding that APIs must be versioned and backwards compatible encourages using patterns like API management systems, gateways, etc to allow decoupling of the backend code from externally facing API.
Complexity in CI/CD pipeline is even more important with cloud native applications and distributed systems.

Peter
Автор

A point to Dependency Management : with using monorepo we have some tools which can auto bump-up the dependency version. While with polyrepo you have to do that individually.

vipulgoswami
Автор

I disagree with a few of the downsides presented here:

1. Difficulty in discovery. The idea that it's harder to find stuff in a single place seems off. Whether your code is in one repo, or multiple, you will still need someone to explain which repos / [directories you do and don't need. The fact that there might be millions of lines of code in a part of the repo that doesn't relate to you shouldn't matter if your team lead / hiring manager tells you which parts do relate to you. In fact, I would argue that with poly-repos it can harder to discover that you're missing a necessary dependency. All you know is your build fails, but figuring out that the build fails because you're missing another repo, rather than an external package is an extra step.

2. Searching. While it's true that you can use built-in tools in various VCS's to search through code, those search features are often very half baked. When you have a single repo it's trivial to do complex searches using regex, exclusions, and scripts, and extra processing layers. If you're using a web interface, then you only have access to the features that this web interface offers. If you need multiple processing steps, forget about it. Granted, you could just clone all the repos into a single directory and go from there, but then you're just re-creating the structure of a mono repo from parts.

3. Dependency building. If you have a project with extremely long build time, you can still have an external dependency registry for some dependencies.

4. Refactoring. With sufficiently advanced tooling, you can do massive refactoring using context-aware programs. Generally any time I've done massive refactoring pushes, the cause was something along the lines of "we're upgrading the language / core module / framework used by a lot of systems" or "we've adding an additional static analysis step, and we need to fix a huge mistake we've been making across the entire code base."

Granted, frequent refactoring of all your smaller projects is certainly a code smell, but on the other hand not all cross-project refactoring needs to be universal. I've frequency seen smaller refactoring efforts that touch an interface layer between two distinct projects. That sort or refactoring effort isn't necessarily a bad thing. We don't want dozens of systems depending on each other, but having a few closely related systems communicating is generally a desired state of affairs. Granted, and API should be versioned, but the idea that every service should be a wholly distinct thing that we should leave along is a very contractor-centric type of thinking. When an executive comes along and says that we need a new feature next month, they usually don't mean "we want 25% of that feature next month, then another 35% the month after, then another 10% in 2 weeks, and the last 30% the following quarter." I understand that's how a lot of consultancies work, but that honestly usually only creates additional friction between staff developers and executives when the employees of the company have to explain to the execs that the new shiny feature they promised to their clients only has partial back-end support, and the only reason the story is marked "completed" is because the people implementing this feature didn't have any insight or visibility into another critical service.

Also in terms of the disadvantages:

1. Slow cloning. This one is correct, but only really for the initial clone. Realistically as long as you have a fairly up-to-date version of the repo, you should never be in a position where you pull and have to wait for longer than a few seconds. If you do, then most likely someone is using the repository wrong.

Additionally, this sort of problem is inherently one that's going to happen only in *extremely* large companies, who are likely going to have large teams to mitigate it. If you don't have those sort of resources, then chances are good you're simply not going to have that large a repo. It's a classic chicken and egg scenario.

2. Configuring change tracking pipelines. If you have a smart repo structure, this is trivial in any CI system I have used. In fact there is extensive tooling and support for this in open source.

Beyond that; as much as you don't want a universal build on every commit, the simple fact that you *can* fire off a universal build fairly trivially is a major advantage. You can set it on a schedule to run once a day at night, or once a week over the weekend, and get an update snapshot of overall system health. What more, as projects are added, you only have to modify the one global pipeline. You can even set up rules to validate that new projects update these configurations. This is much easier than having to track down multiple distinct configurations, which all reference distinct repositories, and must be update as part of an update process which must be documented and executed by god knows what team.

3. Pull requests. If you want to direct PRs to code owners, most VCS's have the concept baked in. This is arguably less work than adding new users to multiple repositories, since you can generally configure this in one place for all repos. If you're getting notified about all the changes in the system, that really means you just haven't set up your monorepo to deal appropriately with a large system.

Basically the entire video seems to be focused on the idea that if you have huge, multi-repo organization, and you want to move to a monorepo layout, then you will need fundamental changes to your process. This is a perfectly valid point, but it's sort of different from what the title of the video seems to suggest. If you're starting a major project and you need to decide between multirepo and monorepo, then you will need to set up a lot of tooling either way, and that tooling will require an extensive time and money investment no matter the choice you make.

If you're thinking of moving TO monorepos, then sure, all the points make sense, but that's not really a fault with monorepos. That's just a problem with large, multi-project refactoring efforts. I just take an issue with presenting a video about the issues with monorepos, and only mentioning at the end that the entire thing is about *switching to monorepos*.

TikiTDO
Автор

You're shooting location is pure awesomeness!
Apart from the video is really helpful (:

lerneninverschiedenenforme
Автор

The last place I worked at, we used polyrepo approach with a central repo having submodule repos. While the parallel development went easier with this approach, the code organization became messy over time with a lot of submodule commit versions to track. After researching more, I now believe monorepo approach (though it has its own downsides too) would be more suitable in scenarios like this.

anisbhsl
Автор

Thank you for a great explanation. There is no black and white. It depends on requirements and resources.

m.k.
Автор

Awesome video! I agree on every point but I have understood these things through many difficulties. If I had found this video earlier it would have helped me tremendously.
Your presentation is clear, precise and methodical.
Thanks for your contribution!

grappachu
Автор

I enjoyed this video a lot because I am someone who is against mono repo to build microservices! I open a folder and use a git branch for per microservice in my single repo as main branches of those micro services and from those sub main microservice branches it becomes very easy to work and build pipelines and also maintains a single repo for each microservice project! Branched based poly repo strategy is very useful but don't know why many famous tech company like google and fb don't use it.

rahatsshowcase
Автор

Monorepos don't prevent you from having backwards compatible apis. They give you the opportunity, but not the obligation, to refactor

soberhippie
Автор

Thank you for the video, I discovered the existence of such a thing as service mesh!

veronikaberezhnaia
Автор

My biggest pain point has been access control. Since the apps and libraries in a mono-repo are tightly coupled with each other it is difficult if not impossible to give separate access to people.

its_maalik
Автор

turborepo kill it all... great video 👍

khaledsanny
Автор

First of all, I'm a regular viewer of your channel and love your videos! Even though I disagree with a lot of what you say in this one :-) Just out of curiosity: how much experience do you have with monorepos and the related tooling?

Now for my comments...

Being able to search the entire codebase is indeed easier on a monorepo. Not sure I understand how the presence of stuff you aren't always interested in makes searching harder though. You can restrict your search to the folders you're interested in.
Cross project atomic commits are as least as big a benefit to collaboration I find. For example, if you work on a service that requires backend and frontend changes, they can be included in the same PR. This reduces the chances of breaking changes and helps you to staying end-user focused.

The problems you describe around dependency management aren't monorepo specific I think. In a polyrepo setup you would also need to rebuild everything that depends on reused code. The dependency tree you describe exists in a polyrepo situation too, unless you decide not to reuse any code or APIs, but that doesn't like a recipe for success. Yes, you need specialized tools to describe and make use of the dependencies, but for a monorepo those tools exist at least.

Indeed, if you frequently need system wide refactorings, that is an architectural smell. And since monorepo makes the use of internal libraries easier, it requires extra discipline not to end up with a distributed monolith. The basic rule for internal libraries is: don't share domain specific classes or logic, only horizontal technological layers. Services should talk to each other via APIs not via libraries. But I have seen distributed monoliths in a polyrepo setup too. This was a result of sharing internal domain oriented libraries, which is just as possible, but with additional versioning headache.
So while I agree that frequent system wide refactorings are a smell, they may still be necessary to get out of the bad situation. And within a monerepo you can at least *consider* doing them.

WRT the disadvantages:
- Slow git operations: this really depends on the size of your codebase. In my experience this is not an issue on medium size (maintaned by say 50 engineers) codebases. Incremental pulls only take seconds, unless somebody has comited binaries, which shouldn't be done anyway. For pristine CI builds it _can_ take some effort to keep the clones fast enough: cached snapshots, shallow checkouts, etc.

- Encourages tight coupling: agreed. It makes the use of shared libraries easier and therefore also the abuse of shared libraries. However, disentangling tight coupling is a lot easier in a monorepo too.

- Pipeline issues: I disagree pretty strongly with this one. For very simple projects, with services that can live in splendid isolation pipelines are simple. But that's not what we are talking about. As soon as you have services that depend on each other, you need to orchestrate your CI. This is virtually impossible to do with polyrepos.
On a monorepo master can be broken in different ways. If any of the leaf projects is broken, this does not affect anybody else and doesn't block development. If a widely used library is broken, this will stop the work, as it should, if you want to practice continuous integration. Monorepos will make it much less likely to happen though. Imagine publishing a shared library in a polyrepo setup. How do you even know if it breaks any of the clients using it?
With respect to the noise monorepos create, that can easily be solved with a folder structure, code owner files or automated folder based PR labeling.

Yes, I am in the monorepo camp as you may have gathered by now :-) If I have to summarize the polyrepo vs monerepo tradoffs in one sentence: while there are a few bad practices that monorepos make easier to do, they illuminate architectural problems that tend to stay hidden with polyrepos while at the same time making it much more possible to address them.

sjappelodorus
Автор

6:55 exactly! Your deployables should be mapped 1:1 to repo. Refactoring in a monorepo is harder, and the scaling makes this harder. Instead of changing the API of one library/service and then changing how its dependencies use it, you need to make all of those changes at the same time, slowing down your entire org while the change happens.

ephilihp
Автор

I just about coughed up my cereal at the end when prompted to subscribe lest I get notifications of any change to your monorepo hahaha. Had I not already been subscribed, this threat would have served as sufficient call to action for me to subscribe.

Awesome channel, you are a walking encyclopedia.

davemeech