What Is Julia and Why Should I Care
Background
Now that we’ve gotten a sense of what this book aims to achieve, let’s take a step back and talk about what Julia is, and why you should care. Julia is a relatively new free programming language that was first released in 2012 by some computational scientists at MIT, and has been growing in popularity ever since. It’s a general purpose language, meaning that it can be used for pretty much anything, but it’s particularly well suited for scientific computing, and is often compared to R, Python, and Matlab. There are plenty of reasons why you might want to learn Julia, but the main ones are that it’s fast, easy to learn (as far as coding goes), and has a great community. And with that, let’s dive in to the benefits and drawbacks of Julia!
Motivation
Why write a book for Julia?
What’s wrong with R, or Stata, or Python, or (insert your favourite language here)?
Excellent questions. I use R every day and think it’s a great language for epidemiology, statistics, visualization, and many more things. But it’s not exactly a fast language. And for me, and plenty more epidemiologists, that can be a problem when working with large data sets or running simulations. Large data is less of an issue because package developers often have already rewritten the core code in a faster language, typically C++. But this is less that ideal if you want to look under the hood as it requires you to learn a second language (and C++ isn’t exactly known for being friendly to newcomers). The bigger issues, though, come when you need to write something from scratch, such as creating a large and complex simulation model, R is sometimes just too slow to actually run on your laptop in a reasonable time period, breaking up the development workflow. In this instance, you have two options: 1) suck it up and just hope your first attempt at the code was correct and don’t need to change anything, or 2) learn and rewrite your slow code in something faster. This is the classic “2 language problem”, and there are plenty of excellent articles on this, such as this Nature Methods article about Julia for Biologists, and this blog post from the creators of Julia.
OK, but not everyone needs to run big simulations or work with big data. Why learn Julia over R in that case?
Another good question that covers a decent chunk of working epidemiologists. If you don’t need performance, I think it’s a slightly less compelling case, but there are still plenty of good reasons to think about learning Julia.
Firstly, it has an excellent syntax and design structure that makes it one of the easier (and fun!) languages out there to learn. It’s incredibly helpful to be able to think in plain language and pretty much just implement that in code - for loops are quick so you don’t need to do mental gymnastics to think about how to vectorize your code. I know that’s a jargony term that might not mean anything to you, but in slower languages, you have to think about more abstract and unintuitive ways of implementing what you want, whereas much of your instincts are idiomatic in Julia i.e. are a fast and preferred implentation. This means that you can do more things more easily in Julia than you could in R, which has shoehorned many capabilities into it with varying degrees of success. Metaprogramming and multiple-dispatch are killer features that are core to how Julia works, and will eventually allow you to write simple and interpretable code that would require tons of if
statements that result in hard-to-read code in other languages, such as R. And the package manager is excellent, making it so easy to create fully reproducible code, whereas R and Python almost seem to be it intentionally hard to do.
Secondly, and this is a pretty big one, but Julia is a general purpose language whose syntax is rather similar to Python a lot of the time. Whether we like it or not, Python isn’t going anywhere. There’s a decent chance that at some point you will come across some Python code, or maybe even need to write some for a collaboration, and knowing Julia will make it pretty simple to pick up the key points, without needing to rely on using slow Python in your day-to-day life.
Finally, if you need it, Julia is first-rate when it comes to scientific computing. A lot of aspects of epidemiology rely on differential equations, math, and increasingly, machine learning. Not only is there the excellent SciML ecosystem that houses a huge collection of state-of-the-art ODE solvers, but also, because Unicode symbols are valid code in Julia, you can literally write mathematical equations that you lifted out of a paper or book, and there’s a decent chance it’ll run with only minor modifications!
It Can’t All Be Sunshine and Roses
Correct. Julia is by no means perfect, and you will have to decide if the benefits outweigh the tradeoffs (I believe they do for a lot of us)!
Firstly, while Julia has an active, friendly, and very enthusiatic community that is often willing to help when you get stuck, it is a relatively new language so support will be more limited than can be found in others. Related to this, not everything you will want to do will have a perfect pre-built package ready to go. Most of the time, this actually isn’t too big a deal as it’s either simple enough that you can quickly figure it out, or complex enough that you probably want to have ownership over the implementation, but occassionally it is genuinely an issue. Both of these issues, however, are improving rapidly as the Julia userbase grows and more people share their questions, expertise, and even turning solutions into packages that can be shared!
The next common stumbling block is related to performance. Despite what I said about it being a very fast language, it’s quite a bit slower at the start of a session than R or Python, for example. This is known as the “Time to first X” (TTFX) problem, and while everything will be almost instantaneous the second time you run it, it can be annoying. Thankfully, with the recent release of Julia 1.9.0, this is effectively a problem of the past, with most Julia sessions loading packages and creating your first plot within a few seconds, not a minute.
And the last main issue with Julia is that packages aren’t always documented as clearly as in other languages, and debugging can be a painful experience. Every package is different, and while the Documenter.jl package does an excellent job of standardizing package documentation, unfortuately not everyone does a great job of writing robust tests and guides to help the user. Sometimes you just need to dig in the weeds to figure something out.
But, that’s part of the motivation for writing this book. For a lot of what we do as epidemiologists, we can work with very established packages that have solid code bases, so we won’t need to worry too much about the code quality as it’s already been tested by thousands of individuals. Instead, this book is to help guide someone new to Julia through the core tasks of epidemiology, and show you how to use the tools that exist in the community, but may have confusing documentation to someone new to the language, or even programming in general.