Site Automation w/ Scripts

The New Way: Script Update

September 20, 2023: I've long wanted to merge the scripts below into a single script. After several years, I've finally gotten around to making that happen. The scripts below have been combined into a single script I call post. The script can be reviewed at: https://github.com/cseanburns/personal_website/blob/master/scripts/post. To use the script, I use the command on the name of the markdown file that I want to add to The Text, like ./post new_post.md.

The Old Way

This site started off with a pretty simple design and site structure, but then I started posting to the The Text on a regular basis, and that led me to want to automate that process. Then things got complicated, and that led me to want to complicate things more.

So to automate posting to The Text, I ended up writing a series of shell scripts that are kind of funny to me. They're funny because they're just a hodgepodge of utilities and other oddities. I could have written these scripts in a more coherent way, if I had worked on that, but I was playing around with ed a lot, so I thought, why not use ed a lot in the scripts. But also sed, etc.

The scripts below are subject to change, but I keep updated copies on GitHub. In the meantime, here's my automation process:

makepost

I use markdown to write posts for The Text. To convert markdown to HTML, I wrote a small bash function that I keep in my .bash_functions file. It looks like this:


# convert markdown to HTML without smart formatting
makepost () {
  local sourcefile="$1"
  pandoc -f gfm -t html "$sourcefile" > \
    $(basename -s md "$sourcefile")html
}

make-article

After I create the HTML file using the makepost function, I need to process this file. For each file (post) I add some special id attributes and some extra HTML elements. To add these things, I mostly use a heredoc with ed commands:


#!/usr/bin/bash

# Original date: 2022-01-28

# Run ``makepost`` (in .bash_functions) first.
# ``makepost`` converts the markdown file into HTML.
# The ed(1) heredoc below adds additional HTML elements, etc
# to make the post ready for the WWW/index.html file.

# Usage: ./make_article post-name.html

base_dir="$HOME/workspace/personal_website/WWW/markdown-posts/"
cd "${base_dir}" || exit
post="$1"

ed -s "${base_dir}"/"${post}" << 'EOF'
0i
<article id="
.
1r !date +\%Y\%m\%d
1,2j
1a
">
.
1,2j
3s/[0-9]\">/\
&\

.
4r !date +\%Y-\%m-\%d
4a
<time datetime="
.
6a
">
.
8s/<\/h3>/<\/time><\/h3>
3,8j
$a
</article>

.
wq
EOF

insert-article

Even though the above script modifies the HTML file, it's still a standalone file, and I need to add it to the main index.html file. It also needs to be placed in a certain position of that file, which is exactly 36 lines in. I don't like having that hard-coded, but I'll fix that later. Anyway, to insert the post into its place, I use sed:


#!/usr/bin/bash

# Original date: 2022-01-28

# Add new post into proper place of index.html file
# Line 36 is always where new posts are added,
# until I change something above that line

# Usage: ./insert_article post-name.html

base_dir="$HOME/workspace/personal_website/WWW/markdown-posts/"
post="${base_dir}/$1"
target_file="$HOME/workspace/personal_website/WWW/index.html"

sed -i "36 r ${post}" "${target_file}"

WWW-toc-generator

Now that the new post has been inserted into its proper place in that file, I parse it to create an entry for the table of contents that I have for the The Text posts. I do that with grep and sed:


#!/usr/bin/bash

# Original Date: 2020-03-27
#
# This script creates a <li> item from the most recent The Text post in
# ../WWW/index.html and inserts the item into the table of contents on
# ../index.html.
#
# Usage: ./WWW-toc-generator
   
homepage="$HOME/workspace/personal_website/index.html"
thetext_file="$HOME/workspace/personal_website/WWW/index.html"

# Get the title of the most recent The Text entry. Substitute the original
# HTML tags for the ones needed for the <li> line in the table of contents
# in ../index.html
grep "h2 id=" "${thetext_file}" | head -n1 | \
    sed -e 's/h2 id/a href/' -e\
    's/h2/a/' -e\
    's/\"/\"WWW\/index.html\#/' -e\
    's/^/\<li\>/' -e\
    's/$/\<\/li\>/' > tmp.html

# Add the new <li> entry to the top of the table of contents in ../index.html
# and ../WWW/index.html 
sed -i "/ol reversed/ a $(cat tmp.html)" "${homepage}"
sed -i "/ol reversed/ a $(cat tmp.html)" "${thetext_file}"

# Cleanup
trash tmp.html

make-rss

I added a RSS feed at some point, and of course I didn't want to manually write RSS entries for each new post. This is my script to automate that process instead.

One of the cool things about writing this script was that I got to learn about lookahead and lookbehind assertions using GNU grep. The documentation for these is in the man pcre (Perl-compatible regular expressions) page. This script is the wildest of them all, and for the first time in my life, it makes me think perl might be fun to learn.


#!/usr/bin/bash

## Generates a RSS entry for the most post in ../WWW/index.html
## Date: 10/11/2020
## Sean Burns
## Usage: ./make-rss

## Part 1: Get values

# Create a date for the <lastBuildDate> xml element.
rss_date="$(date -R)"
echo "${rss_date}" > tmp.xml

# Extract title value for the most recent attempts entry from the index.html file.
# The secong grep is a GNU trick for positive and negative look-ahead and
# look-back searching. See:
# https://stackoverflow.com/questions/13242469/how-to-use-sed-grep-to-extract-text-between-two-words
# A nice explanation for how lookahead and lookbehind work:
# https://fabianlee.org/2021/01/09/bash-grep-with-lookbehind-and-lookahead-to-isolate-desired-text/
grep 'h2 id=' ../WWW/index.html | \
  head -n1 | \
  grep -o -P '(?<=>).*(?=<)' >> tmp.xml

# Insert a temporary blank line
printf "\n" >> tmp.xml

# Create a link using the baseurl and the <h2> element from index.html.
printf "https://cseanburns.github.io/csb/WWW/index.html#" >> tmp.xml
grep 'h2 id=' ../WWW/index.html | \
  head -n1 | \
  grep -o -P '(?<=").*(?=")' >> tmp.xml

# Copy the date to a new line in order to create <lastBuildDate> and
# <pubDate> lines.
printf '1t4\nw\n' | \
  ed -s tmp.xml

# Delete the blank line.
printf '3d\nw\n' | \
  ed -s tmp.xml

# Copy link from above to use with the <guid> XML element
printf '3t4\nw\n' | \
  ed -s tmp.xml
 
## Part 2: Add all of the XML tags for all the values above

sed -i '1 s/^/<lastBuildDate>/' tmp.xml
sed -i '1 s/$/<\/lastBuildDate>/' tmp.xml

sed -i '2 s/^/<title>/' tmp.xml
sed -i '2 s/$/<\/title>/' tmp.xml

sed -i '3 s/^/<link>/' tmp.xml
sed -i '3 s/$/<\/link>/' tmp.xml

sed -i '4 s/^/<pubDate>/' tmp.xml
sed -i '4 s/$/<\/pubDate>/' tmp.xml

sed -i '5 s/^/<guid>/' tmp.xml
sed -i '5 s/$/<\/guid>/' tmp.xml

# Add the opening and closing <item> XML elements for the new entry block
ed -s tmp.xml << EOF
2i
<item>
.
6a
</item>
.
wq
EOF

insert-xml

The above script only generates the XML entry. It doesn't insert that entry into the index.xml file for the RSS feed. I have this script for that:


#!/usr/bin/bash

# Orignal date: 2022-01-26

# Insert contents of the tmp.xml
# into the RSS file at ../WWW/index.xml 

target_file="$HOME/personal_website/WWW/index.xml"

cat tmp.xml "${target_file}" > new.xml
ed -s new.xml << EOF
1m15
14d
1,6m15
8s/^/    /
10,15s/^/        /
11,14s/^/  /
.
wq
EOF

mv new.xml "${target_file}" 
rm tmp.xml

sendpost

Finally, I can deploy the updated files to the server:


# deploy updated pages to server
sendpost () {
  local basedir="$HOME/workspace/personal_website"
  local main_directory="SERVER:~/public_html/"
  local sub_directory="SERVER:~/public_html/WWW/"
  scp "$basedir/index.html" "$main_directory"
  scp "$basedir/WWW/index.html" "$sub_directory"
  scp "$basedir/WWW/index.xml" "$sub_directory"
}

I'll continue to bake these scripts until they're more polished, but for now, it does the job, and I'm good with that.