how should I write notes | Caspershire Meta

“In a never ending quest to seek for knowledge, one must engage in the pursuit of retaining it.”

Retaining information has been particularly hard for me since the last couple months. When it comes to reading, PhD does not joke about this. I would bump into walls that would necessitate reading at least 2 papers to understand a particular biological phenomenon, and at times I would not be able to figure out head or tail after spending so much time reading.

If reading plus understanding alone is a challenging process, retaining it would be a Gordian knot.

text notes

Back when I was in undergrad and during my first year PhD, majority of my notes were saved in .txt files. There were two reasons why I did that: (a) .txt files can be viewed virtually on any devices, and (b) absence of formatting makes it a whole lot easier to deal with when taking notes at a speed. Taking notes in LibreOffice would require cursor movements to activate bullet point (maybe around 0.5 second) but in a text file, a single hit on that <Tab> button would only cost me around 0.1 second. Yes, I know, this is strange but speed is important.

In hindsight, it probably would have been better if I wrote my notes in markdown format. That would allow me to use some arcane scripting technique to convert .md files into pleasantly looking HTML or PDF.

but, text notes…

Text notes would only be suitable for retaining information that comes from a single source (or maybe two). In the case of taking notes during in classroom during lecture, it would work just fine.

Now, when tracking information obtained from multiple sources on the same topic, there are a number of things to track: who said that, how did s/he arrive to what s/he said, what s/he said about the previous study, and when this happened, and who said otherwise, and why s/he said otherwise.

As a scientist, I do need to care about these specific details. Science is not static. Everyday, researchers making new discoveries that are hinged upon previous findings. Sometimes, the new discoveries corroborate previous studies, sometimes contradict.

This fact necessitates a complicated approach to taking notes and that would point to a solution that I could see myself doing it: start using a personal wiki system. A wiki allows inter-linking between documents (i.e. pages), something that text documents can’t. A click on a hyperlink brings you to another page. I want this superpower.

but, how…?

The first and perhaps obvious choice would be Wikipedia, or technically the system that powers Wikipedia: MediaWiki. I have experience deploying and maintaining a Mediawiki instance. Safe to say if could avoid using it, I would. Why? While it is compatible with SQLite, it runs better with MySQL/PostgreSQL for a number of technical reasons. My beef with MySQL is that it consumes so much RAM. I am okay with MySQL if I am running a mission critical multi-user Mediawiki instance, but I am not okay if it is for a personal use because I think it is overkill.

Also, whenever I was considering to deploy a Mediawiki instance, the thought was almost always accompanied by the desire to also deploy VisualEditor, which requires a Parsoid instance, and that would be another hit on memory (since it runs on NodeJS).

As a summary, no I do not want a Mediawiki instance because I feel it would be overkill for my personal pursuit in this regard. That leaves me to 2 possible alternatives. Well, three, if I want to spend money to get a monthly subscription to the Atlassian Confluence. I don’t think I would go down to this route with Confluence.

So, 2 possible alternatives: TiddlyWiki and DokuWiki. The good thing about both is that they do not require an extensive server stack installation. As for TiddlyWiki, it lives and thrives as a single HTML that you can carry in your USB drive and requires no webserver to serve it. As for DokuWiki, you need a webserver (e.g. Apache2 or Nginx) and PHP interpreter to run it, but it does come with portable version that you can put in a Dropbox or USB drive.

picking my poison

Retaining knowledge and ensuring its survival for my graduate education career requires a degree of complexity. While TiddlyWiki wins in term of portability and mobility, I have this feeling that it will not be sufficiently complex to cater my needs for a full-blown wiki that would closely resemble the Mediawiki.

Thefore, I would prefer to go with DokuWiki. Since I have a VPS ready for a DokuWiki deployment, there is another problem that should be given attention: backing up. This is where DokuWiki, in my personal opinion, excels.

DokuWiki writes data onto disk as plain text files and does not rely on complex database system like how MediaWiki relies on MySQL/PostgreSQL dbms. MediaWiki’s reliance on MySQL/PostgreSQL is because it needs something to manage many people adding data simultaneously all at once. Since I am the only one writing data to my DokuWiki instance, I don’t need this feature.

As for backing up data (the text files generated by DokuWiki), there are 2 potentially good solutions. First is using git, as simple as commit and push to remote/local git server or using a specialized tool to push changes to Amazon S3.

The question is: which one has the most appeal?

backing up: pushing only changes

The official DokuWiki has a dedicated page for backing up data to AWS S3. It is not plugin, rather scripts (Ruby/Python) that you can run manually or automated (with cronjob). By looking at the scripts, what they do is they compress the specified directories (that hold the data), datestamp them, and send the datestamped archive to Amazon S3.

This solution is great, except for one thing: no delta. Over time, the compressed archives will increase in number even though the changes between archives are not significant. Thus, I would prefer using a backup system that pushes only changes (i.e. deltas). The tool rsync could do this. However, rsync does not support AWS S3. The alternatives are aws-cli and rclone.

If it is all possible, I would prefer a much simpler alternative. Enter, git. I should say that what deterred me from going forward with AWS S3 was its relatively lengthy workflow, which would require me to (1) set up another access key through AWS identity management (AWS IAM), (2) configuring the aws-cli or rclone to connect with my AWS, so on and so forth. The relative ease of setting up git is undeniable. I just had to make sure the SSH public key was added to my remote git repository, initialized, and ready to go.

However, my heart is not all closed for playing with rclone. Maybe that could be another side project for another day.

setting up my wiki: The Caspershire Atlas

Prior to this, Caspershire Atlas served a WordPress instance in an LXD container (to which I am still writing about it). The process to get the WordPress to play with LXD was quite a challenge because of a number of assumptions that WordPress make, primarily is because a typical WordPress installation expects that it runs on a shared hosting. Alas, it worked and I should at some point write and tell the world about it.

My current DokuWiki setup runs on php7.2-fpm (Ubuntu package php-fpm) with caddy serving as the HTTP webserver that allows for pretty link using the .htaccess mode and free SSL certificate by Let’s Encrypt.

Since my webserver setup with caddy is a little unconventional (as opposed to using Nginx or Apache2), I bumped into a problem with what (I believe) came from the PHP setting.

To fix it, I edited the www.conf file.

# open the file
sudo vim /etc/php/7.2/fpm/pool.d/www.conf

On line 23 and 24 for the directive user and group, I changed it to my server’s current $USER (e.g. aixnr). I also did the same thing for the directives on line 47 and 48, and those directives are listen.owner and listen.group. Then, I restarted my PHP daemon and the problem went away. If you are using the traditional webserver like Nginx and Apache2, most likely you are not going to bump into this issue because both webservers use the user and group www-data to serve traffic, and PHP is set to work with www-data by default.

Here is my caddy directive for serving my DokuWiki instance.

addres.website.net {
  tls [email protected]
  fastcgi / /run/php/php7.2-fpm.sock php 
  root /home/aixnr/cms/dokuwiki
  internal /data
  internal /conf
  internal /bin
  internal /inc

  rewrite /_media {
    r (.*)
    to /lib/exe/fetch.php?media={1}
  }

  rewrite {
    to {path} {path}/ /doku.php?id={path_escaped}&{query}
  }
}

This directive enables the Let’s Encrypt free SSL (always a good thing), prevents access to certain folders, and allows .htaccess-compatible DokuWiki nice url (a.k.a pretty permalinks).

backing it up

The data directory of a Dokuwiki instance is the location where all the data resides, but we do not have to back up the whole thing. Here is my .gitignore file.

attic/
cache/
index/
locks/
meta/
media_attic/
media_meta/
tmp/

*.php
.htaccess.dist

COPYING
README
VERSION

.htaccess
_dummy
deleted.files
security.png
security.xcf

Based on this .gitignore, I am only interested to back up the folder media and pages. To back up the data, run the git command as usual.

# commit changes
git commit -a

# push to remote server
git push origin master

This is how a manual process looks like. How to automate this? We need to write a short script to commit our changes (deltas) with datestamp and push it to a remote git server.

automatic backup with cron

Perhaps the obvious answer is cronjob. While there are ways to watch directory for changes, such setup feels a little bit complicated for me and there are a number of logics that need some hammering. Hence, a simpler alternative is to use cron.

First, we need an executable bash script to perform the commit and push:

#!/bin/bash

#-------------------------------------------
# Perform backup to a remote git repository
#-------------------------------------------

cd /home/aixnr/cms/dokuwiki/data
git add .
git add -u
git commit -a -m "Content update `date +'%H:%M %d/%m/%Y %Z'`"
git push origin master

As always, the command chmod +x is needed to make this as an executable.

Important note for myself here. When I first wrote this script, I did not put the folder directive because I assumed if the script was in the same folder as the target directories, it should be fine. Wrong. Since the script would not be executed in the folder it is in, it must know where it should go.

I did a quick test afterward to test it. Before that, I created an empty file for logging in ~/tmp directory.

# create a log file
touch ~/tmp/backup.log

# run the test in home directory
/home/aixnr/cms/dokuwiki/data/backup.sh >> /home/aixnr/tmp/backup.log 2>&1

If the log says everything is fine and okay, now it is the time to add the command above in cron.

# Open crontab
crontab -e

I would like this script to be executed hourly, so this is what I have in my crontab.

0 * * * * /home/aixnr/cms/dokuwiki/data/backup.sh >> /home/aixnr/tmp/backup.log 2>&1

Say that if we would like to monitor the output for few hours.

# monitor the output
tail -f ~/tmp/backup.log

This method worked well for my case and I am happy that I have an automatic backup solution in place now.

what’s coming next?

Now I am all set to write, comprehend, and regurgitate new knowledge into my secondary brain (the Atlas) and hopefully this project will last long enough to see my hardbound thesis.