The Problem with Research Software Engineering

preview_player
Показать описание
A discussion about how to make research software engineering a bit better!

Bibliography
Рекомендации по теме
Комментарии
Автор

A bit of a different video today about something that's been on my mind. I know it's a bit of a rant and more or less a clip from my livestream, but I thought some people might benefit from it! Let me know if you like this type of content as well. If so, I am happy to do more "lecture-style" videos on various topics.

LeiosLabs
Автор

Having recently started as a "real" software engineer after finishing my PhD, I recognize many of these problems. We did do version control and unit-testing for our research software, but I often passed up on good software documentation in favor of writing the actual research articles. I've also had many requests from colleagues to share my code for making high-quality graphs. Most of the time I had to reply with: "You can have my code, but it won't work directly on any other data than mine. Please take my code as-is, and use it as an example to try writing something of your own." I know I could have made my graphing tools much more modular and general, but at the end of the day I needed to have my thesis finished.

bartzijlstra
Автор

As someone who has worked in both pure software development and pure CS research positions, I completely agree. Specially when it comes to documentation and peer review of code, I’m shocked by a lack of standardization. Asking a researcher for access to their code is a true roll of the dice.

AngryArmadillo
Автор

I worked as a research assistant in a chemistry laboratory that primarily deals with simulation. The lab head is still using a FORTRAN for nucleation simulation. I believe that code is at least 20 years old. When I tried to read the code it has variables like 'xxx' and 'yyy'.

rentristandelacruz
Автор

Congrats on your phd, I completely agree with everything you say in this video.

zebulon
Автор

Competitive programmers may be able to help. You can get relatively clean and simple code from very complex new algorithms if you ask competitive programmers. We are trained to code common algorithms really quickly and occasionally search for better (faster, more memory efficient, working online, etc.) algorithms to implement so we can use them as "secret weapons" during contests.
As an example: given a tree graph of N nodes, it is widely known that you can find its centroid decomposition in O(N log N) time. However, a quick Google search will lead you to a paper demonstrating O(N) centroid decomposition which has no code. To verify, we usually just read the paper, code the algorithm ourselves, and stress test it against the verified slower algorithm with thousands of randomly generated cases.
Might it be possible for researchers to get competitive programmers verify their work?

Pa_Nic
Автор

I work in the DSP field and we work closely with people in academia. I 100% agree with what you say. So much time could have been saved if the code handed to us was written better or even followed the paper.
I think a big thing is that some older people in academia have the attitude of "if you used simulations, you didn't solve the problem." I personally think it's weird to see people not use software as a tool for verification on both generated and real data.

AaronPM
Автор

This was very helpful. I'm going to look more into JOSS.
As a Physicist interested in Scientific computing, unit testing seems like almost a foreign concept, and I feel fairly inadequate compared to my computer science peers.
I've had enough exposure to the importance of version control prompting me to learn git myself. For anyone else in a similar position, look at the MIT Missing Semester Jan2020 IAP for similar computer sciencey-"filler" education.
More videos about CliMA would be cool : )

apurbabiswas
Автор

My experience with researchers writing code was that the piece of software they needed most was git. So much

Axman
Автор

Thank you for posting this. Going through my PhD now, I experience many of these pains that you've clearly outlined here. If we could continue to grow this discussion and build a scientific community more embracing of software engineering practices, starting with git and code re-usability, the long term gains would certainly outpace the short term learning pains.

brandonnelson
Автор

Thank you so much for posting this video. What I've heard for a lot about algorithms is that when a paper is written, and it says that it has great performance, it's very likely that the implementation will be very costly and won't have better performance than the current solution. OFC, there are also some breakthroughs.

youtubereview
Автор

As a master student working in a research group I could not agree more with all the things you just said.

felixrichter
Автор

This video is so spot on! Publishers need to see this.

alijassim
Автор

Congrats to your PhD!

Thank you for your perspective on research software engineering. I have never seen course offers at my university for scientists on how to write good software and in the end it comes down to teaching yourself.
I work in the same field (PhD candidate in computational fluid dynamics with LBM / physics) and I've seen lots of bad code as well, due to the points you've discussed. But that's not always the case.

The incentive to write clean code is given at least once you work on software as a team. We do refactoring and on a regular basis and make sure every line is properly documented.
Because of the teamwork, version control becomes a necessity as well.
Testing code is actually most of the work. If code is not testable and the results are not reproducible, it is trash code, no matter in which field.
The main incentive for our software project actually was hardware (GPU) efficiency and performance. No other software on the market is capable of comparable performance, so we had to write our own.
Regarding job chances, research software engineering is not a dead end at all. If you really master scientific programming, you don't have to apply for a job because companies will apply for your time.

ProjectPhysX
Автор

Right on point. I am currently trying to refractor an old academia codebase consisting of Matlab, Python, Java, and C++ that are glued together using Matlab, and it is just a nightmare. And yes, Matlab is evil - you often see thousands of lines of code without encapsulation and a huge namespace. I genuinely think that much more people would have used the code if it was written in a more professional fashion.

gavinpeng
Автор

This is an essential topic for research. More incentive should be given towards research software development. Many of the high quality research depend on how well a simulation or model has been formulated and executed. Better programming practices in developing research works will lead towards better research scopes.

rifatahamed
Автор

This video speaks to me so much. I was a software engineer/systems engineer before going back to grad school, and I was the only computational-focused person in my lab for Neuroscience. There were other folks who knew how to program (and some who couldn't do more than a stats script), but writing "good" code (as loaded as that is) was just not a priority because no one else was ever going to see it (because there was no avenue to share and no one wants to replicate results anyway).
Lo and behold, my code ends up being pretty useful for some other work (related to TBI), and it is fortunately very documented so I was able to share it. It's far from perfect, and finding the balance of where to stop on it because it was good enough was a huge challenge. I would've loved to submit it to a journal and get it more polished, but there was no value in that (at least relative to the other priorities I had to graduate).
I wish I knew how to help push the culture forward in this space. I left academia after graduating, so I'm afraid I'm not being very helpful. I've started publishing again recently around my volunteer work, so maybe that's my avenue to help.

KevinHorecka
Автор

found you through OIST's youtube channel, love your videos! thanks for sharing your passions

tallon
Автор

This video has really made clear some issues I've noticed at my current (research focussed) job and it's very satisfying to hear it stated succinctly

HatersGonnaHate
Автор

Thank you for this video. I'm a 3rd year doctoral student in Applied Math, and specifically the scientific computer subdisciplines you mention. I'm currently finalizing a moderate size (about 4000 lines of C) codebase to be open sourced along with a paper submission. There a serious crunch-time feeling which is causing various holes in documentation as well as crappy inefficient fixes. You're definitely right, writing well documented code feels impossible when one is also supposed to also be pushing out theoretical breakthroughs of some flavor.

On the other hand, it is also very hard to write code that works without a strong grounding in the theory of a subject.

SoopaPop