Programming for Science

What Is Good Code ?

The only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code. - Hadley Wickham

There are several practices and guidelines to write good code.

I will talk about styling and testing

rOpenSci (Stats) Software Review

Non adversarial, constructive, transparent, no rejection.

Helps disseminate best practice.

Builds a community of practice.

Book rOpenSci Packages: Development, Maintenance, and Peer Review

Code Smells

Structures in code that suggest refactoring. - Martin Fowler

Refactoring is to modify code to make it easy to understand and modify without changing behaviour.

Code smells create coginitive load and it is more likely to contain errors.

Code - Naming things

Code smell: linguistic antippatern

Code do something different than names involved suggest.

A variable is_valid that contains an integer (the name suggests a boolean).

Variable and function names should use only lowercase letters, numbers, and underscores.
variable names should be nouns and
function names should be verbs.
avoid re-using names of common functions and variables.
Strive for names that are concise and meaningful (this is not easy!).
Agree on one mold to name variable and be consistent on your code base.

Code - Organizing things

Style Guides: conventions to write code.

Always put a space after a comma, never before.
Do not put spaces inside or outside parentheses for regular function calls.
Place a space before and after =
For R:
- Use styler and lintr packages.
- The tidyverse style guide
For Python:
- The standard Python style guide

Testing

How to be sure our code produce reliable results? We can’t - not completely - but we can test its behavior againts our expectations to decide if we are sure enough. - Research Software Engineering with Python. Building software that makes research possible

Assume that mistakes will happen and guard against them (defensive programming).

Testing - Assertions

Assertions: used to catching errors. We introduce assertions to our code so that it checks itself as it runs (Python’s assert statement or R’s stopifnot functions).

A precondition is something that must be true at the start of a function in order for it to work correctly.
A postcondition is something that the function guarantees is true when it finishes.
An invariant is something that is true for every iteration in a loop.

Testing - Unit Testing

Unit Testing: used to prevent errors. Unit test checks the correctness of a single unit of software, for example a function.

A unit test will typically have:

a fixture, which is the thing being tested (the data);
a result the code produces when given the fixture; and
an expected result that the result is compared to.

We request packages have a test suite, preferably unit tests for all functions, ensuring key functionality is covered. (75% test coverage).

Testing - Integration Testing

Integration Testing: unit tests give us some confidence that our units of code work in isolation. Integration testing check they work correctly together.

Integration tests are structured the same way as unit tests: a fixture is used to produce an actual result that is compared against the expected result.

However, creating the fixture and running the code can be considerably more complicated.

Testing - Testing Frameworks

Testing Frameworks: helps to run and manage several unit test.

For R: testthat
For Python: pytest

In the Testing Chapter of rOpenSci Guide there are several other packages recommendation if you need to test access to databases, creating plots, interaction with web resources, among other task.

Testing - Continuous Integration

Continuous integration (CI): runs tests automatically whenever a change is made.

tells developers immediately if changes have caused problems.
can set up to run tests with several different configurations of the software
or on several different operating systems.

In rOpenSci Dev Guide there is a chapter on Continuous Integration Best Practices

Where to learn more

R materials:

rOpenSci Packages: Development, Maintenance, and Peer Review
Tidyverse Style Guide
Testing in R: testthat, swamp, HTTP testing in R

Python materials:

General material:

The Programmer’s Brain by Felienne Hermans.