My Plain Way To Do Text Social Science

Burns, C. Sean

My Plain Way To Do Plain Text Social Science

A research workflow can be a rather personal process. Here I describe a text-based approach to the research process. This is, for me, an ideal way to do research, but it's not always practical given the way many others work.

A research workflow is a fairly personalized process. People use classes of tools to essentially accomplish the same kind of work. We use statistical applications, programming languages, and qualitative data analysis software to prepare and analyze data. Digital lab notebooks and like are used to keep notes, logs, and reflections about data and the research environment. Some people use bibliographic reference managers to build and curate reference libraries, and then those can be integrated into word processing programs to write papers.

People also use different instances of tools within each class of tool. For example, some may use Python, R, SPSS, STATA, or something else for statistical work. And a host of specific software exists for all of the other classes of tools listed above. My guess is that people adopt these tools often based on their first substantial encounter with them. Perhaps they were in graduate school and were introduced to SPSS in their first statistics course, and so that's what they use now because that's what they have come to know. The surrounding research culture matters in these things.

I think because my first experiences with a computer were via the command line (DOS from the late 1980s to the early 1990s), and perhaps because of some quirks in the way I prefer to interact with the world, I favor command line approaches to most of the ways that I use a computer. I use command line text editors and a command line email client for a lot of my work but also a command line web browser and other CLI utilities for random other things. Basically, I want to work in the terminal as much as possible.

For instance, I use the Vim text editor for most of my writing; however, I use ed(1) a lot nowadays too, but mostly for getting the word down "on paper" and for revising. I also like to use R and Python on the command line or in Vim. I like RStudio, but I prefer to convert Vim into an R IDE using some Vim plugins. Since these kinds of tools are all text-centered, they integrate really well with version control systems, like Git. (To be fair, applications like RStudio follow a text-friendly model, but I still prefer a text user interface as much as possible.)

I have to budge when I collaborate with others. When collaborating, I cannot write papers (completely) in Vim because most of the people in the world around me do not use a text editor to write their papers. Instead, they use word processor software and other graphical applications. And since I want to work with these people, that means I need to conform to their ways rather than ask them to conform to mine. People know the basics of using a word processor, but my way requires effort and time to acquire proficiency. And I have yet to meet the person who wants to work with me so badly as to attempt to try my way of doing analysis and writing academic papers!

Kieran Healy wrote a nice discussion about this issue and the utility of plain text in the research workflow called The Plain Person's Guide To Plain Text Social Science. He created terms for the scenario described above: The Office Model and The Engineering Model. With the Office model, word processors function as the main writing and writing management tool, and in the Engineering model, the text editor takes primacy. There are implications with each approach. For example, if someone uses Word, LibreOffice Writer, or Google Docs, then they might rely on tracked changes and file naming schemes to manage drafts and versions of a manuscript. And these might be shared via a barrage of emails or synced via a shared Dropbox folder. If someone uses a text editor, though, then it makes it possible to use Git and a remote repository, like GitHub or GitLab, to manage versions. This latter approach might help with reproducibility issues. I'd encourage readers of this page to read Healy's discussion of the issue.

Aside from all these implications, when I get to work solo, I get to work in a way that feels really good to me because I get to use the tools I favor. That doesn't mean I can avoid all graphical or non-terminal applications and workflows, but I can avoid a lot of it as well as much of the Office model way of doing things. I still need a graphical web browser, and I like and use Zotero to manage bibliographic references and to store notes. Although I can read modern PDF and word processor documents on the command line, it's not an ideal way to read them, especially ones that contain figures, plots, and other graphics. So that means that I regularly use a graphical PDF reader, Firefox, and LibreOffice to read such files.

When I work solo, oftentimes I start to write in ed(1). It's not just the standard editor, it's the grandparent of all distraction free editors. As a line editor, it's also really nice for editing and revising text---that is, line by line. Eventually Vim takes over, but I'll still revert to ed(1) to revise my writing. Vim and ed(1) share quite a few commands since the former descends from the latter. But Vim is a more modern tool, and as such, I use a variety of plugins to extend its functionality. The Vim plugins that I currently use to manage my research workflow and my academic writing include:

Nvim-R: Nvim-R turns Vim into an IDE for R.
ncm-R: ncm-R adds R autocompletion.
vim-gutter: vim-gutter adds Git functionality. I don't use it on my laptop because it causes noticeable delays, but it works well on my desktop.
tabular: tabular filters and aligns text, especially markdown formatted tables.
zotcite: zotcite inserts notes and citations and generates bibliographies from Zotero.
goyo.vim: goyo.vim converts Vim into an even more distraction free writing interface. It centers the text on the screen and adds padding around the text.
limelight.vim: limelight.vim highlights the current line and supplements the goyo.vim plugin.
vim-pencil: vim-pencil enhances the markdown experience.
vim-wordy: vim-wordy checks word usage problems.
thesaurus_query.vim: thesaurus_query.vim provides a thesaurus.

The Vim and R integration (nvim-R) took some adjustment when I started using it, and I still use a cheat sheet to help, but that's mostly because I don't use it every day. Getting the zotcite plugin figured out and configured took a couple of hours, but it was worth it.

With these tools in place, I can write papers in markdown, maybe insert code and results directly from R if I'm using R, and add Zotero keys for inserting references using the zotcite plugin. When I'm ready, I can render the markdown file, generate the bibliography and in-text citations, and so forth using pandoc.

This is what a manuscript might look like in Vim using zotcite, markdown, and some YAML at the top.


---
title: Title of paper
author:
  - First Name Last Name
keywords: [keyword1, keyword2]
abstract: |
  This is the abstract.

  This is the second paragraph in the abstract.
published: true
---

# Introduction

This is a sample manuscript. To add a **parenthetical name and year as an
in-text citation**, surround the Zotero key with square brackets
[@author_name].

# Methods

To add a **parenthetical name (year) as an in-text citation**, simply insert
the Zotero key @author_name.

To convert markdown with citations using APA format, I use the Bash script below. It took me a bit of time to figure out how to default to the APA format. Turns out all I had to do was point to an apa.csl file (Citation Style Language), which fortunately is part of the desktop Zotero installation.


#!/usr/bin/bash

makeapa () {
  local sourcefile="$1"
  local zotfile="${HOME}/.vim/plugged/zotcite/python3/zotref.py"
  local apafile="${HOME}/Zotero/styles/apa.csl"
  pandoc "${sourcefile}" -s -o \
    "$(basename -s md "$sourcefile")"html -F\
    "${zotfile}" --citeproc --csl "${apafile}"
}

makeapa "$@"

With this setup, I can keep my data, my manuscript, my code, and whatever else I need in a single directory that's versioned with Git.

This all being stated, I have yet to be able to use such a pure Engineering model workflow while collaborating. But maybe one day soon.