You probably know git. It's a version control system, a collaboration platform, and so much more. I recently used it as a step-by-step tutorial platform to illustrate a how-to in programming. Please find it on github.
I was asked to give a talk about Building Packages for R and Python at the 2019 edition of the openmunich conference. The aim of my talk was to advertize developing algorithms in C++ because it is then very easy to port them to python and R with as little additional code as possible by using Boost.python (the link points to the latest release of Boost, but Boost.python is actually a part of Boost for some time now...) and Rcpp.
During my talk I wanted to demonstrate how to build a minimal viable library for both python and R. Of course, PowerPoint is not the right tool to show that. That's why I put together a git to accompany my talk which contains a step-by-step tutorial from a simple C++ program to usable python- and R-libraries. You can find it on github.
git and step-by-step?¶
How is a git repo suitable for a step-by-step tutorial? Short answer: tags.
On github, you can access tags using the Branch-button. Also, tags automatically appear as releases.
Many open source projects use tags to label certain commits and, thereby, mark a release of their software. You can use these tags to checkout specific commits and get a copy of a certain version of the software.
In my case, I used tags to point to certain steps of the tutorial. That way, participants were able to inspect a particular step by checking out the corresponding tag. At that "point in time", the repository contains a viable version of the program, or the library, that actually compiles and works. The only question is: How to put together such a git?
bash-magic¶
Although the scope of the tutorial was small, it contained 10 steps, and, hence, 10 commits to the repo. Enough for me to pull out some bash magic and automate the process of creating the git. What do we need for this?
- A URL to the git of your course git,
- a folder with your actual content,
- a file containing a list of tags,
- a file listing which files should be commited for which tag, and, finally,
- a file listing which files should be deleted for which commit (and tag).
This is all my bash script needs to automate the process. The script will add initial files, it will add and delete files belonging to certain commits, and it will turn on and off parts of committed files. Of course, it also sets tags correct.
tags and file content¶
A crucial part of automating this is to hide parts of the final source code in earlier steps of the tutorial. For example, the code of a very basic python module looks like this:
#include "fancyalgorithms/fancy_functions.hpp"
#include <boost/python.hpp>
namespace py = boost::python;
BOOST_PYTHON_MODULE(fancymodule)
{
py::def("fancy_increment", fancy_increment);
}
Throughout the tutorial, more and more lines will be added:
#include "fancyalgorithms/fancy_functions.hpp"
#include <boost/python.hpp>
namespace py = boost::python;
BOOST_PYTHON_MODULE(fancymodule)
{
py::def("fancy_increment", fancy_increment);
py::class_<FancyObject>("FancyObject", py::init<>())
.def(py::init<int,int>())
.add_property("min", &FancyObject::get_min, &FancyObject::set_min)
.add_property("max", &FancyObject::get_max, &FancyObject::set_max)
.def("random_increment", &FancyObject::random_increment);
}
But, of course, we don't want to manage a separate file for each stage of the tutorial. We rather want to hide parts of the final file for earlier steps. Hence, I annotated the source file to mark parts of the file that should be hidden. In fact, the source file looks like this:
#include "fancyalgorithms/fancy_functions.hpp"
#include <boost/python.hpp>
//<<400-container
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>
//>>400-container
namespace py = boost::python;
BOOST_PYTHON_MODULE(fancymodule)
{
//<<100-simple_python
py::def("fancy_increment", fancy_increment);
//>>100-simple_python
//<<200-a_class
py::class_<FancyObject>("FancyObject", py::init<>())
.def(py::init<int,int>())
//<<300-class_members
.add_property("min", &FancyObject::get_min, &FancyObject::set_min)
.add_property("max", &FancyObject::get_max, &FancyObject::set_max)
//>>300-class_members
.def("random_increment", &FancyObject::random_increment);
//>>200-a_class
//<<400-container
py::class_< std::vector<int> > ("IntList")
.def(py::vector_indexing_suite< std::vector<int> >());
py::def("fancy_increment_container", fancy_increment_container);
//>>400-container
}
In the comments of the source file, I add the annotation. The annotation //<<100-simple_python
triggers that the following part of the source file will be part of the file committed to the tag 100-simple_python
up to the corresponding end block //>>100-simple_python
. As you can see for the 200-a_class
and 300-class_members
, blocks may overlap.
From these implicit tags, a taglist is created automatically by collecting the annotations and sorting them alphabetically. The resulting tag list effectively shows the steps of the tutorial:
100-simple_python
200-a_class
300-class_members
[...]
This order also configures which code blocks refer to earlier commits and should stay for later stages. For example, when preparing the source file for 300-class_members
, all code from 100-simple_python
and 200-a_class
should remain visible, of course. Generally, this is done via awk (The link takes you to the relevant section of create_course_git
, if you're interested).
From the final course git, participants can checkout git checkout 100-simple_python
and have a copy of the first code listing in their local git. Later, they will run git checkout 300-class_members
and have the second code listing to work with.
tags trigger file addition and deletion¶
In relation to the above-mentioned tag list, the two files commits_thefiles
and commits_thefiles_delete
state which files should be added and deleted for which commit (only the first file is created automatically). Since both use the same syntax, this is an excerpt from commits_thefiles
:
.gitignore
core/src/main.cpp
core/Makefile
core/fancyalgorithms/fancy_functions.hpp
100-simple_python : pkg-python/src/fancymodule.cpp
100-simple_python : pkg-python/Makefile
100-simple_python : pkg-python/test/fancymodule/__init__.py
100-simple_python : pkg-python/test/main.py
500-getting_started_r : pkg-r/Makefile
You can see lines starting with tags, as well as lines without tags. Lines without tags cause files to be added to the initial commit. In this case, it is the dead-simple C++ library, that we want to open up to python and R. The lines starting with a tag point to files that should be added to the commit of the corresponding tag.
So, starting from the final version of your tutorial, the corresponding course git fills with new files and new file content. The only question remains: how can you use it?
how to use it¶
First of all, get the scripts on github. Then, put together your final source code, and:
- Add annotations whose alphabetical order corresponds to the order of the steps of your tutorial.
- Run
create_course_git
. Most importantly, this creates the filecommits_thefiles
. - Go through
commits_thefiles
to remove lines for unneccessary file and add tags to configure which files to be added at certain points in time. - Add files and tags to
commits_thefiles_delete
to configure the deletion of files. - Re-run
create_course_git
. This puts together the repo in a local folder, then pushes everything to the remote repository, and, finally, syncs the tags with the remote repo.