Systems Librarianship

Author: C. Sean Burns
Date: 2023-01-03
Email: sean.burns@uky.edu
Website: cseanburns.net
GitHub: @cseanburns

Introduction

This short book is a work in progress. A main, first draft should be completed by the end of April 2023.

I am writing this book as I teach a course on Systems Librarianship. The book and course's goals are to provide a technical introduction to the basics of systems librarianship using Linux.

The course and book goals include:

  1. how to use the Linux command line in order to become more efficient computer users and more comfortable with using computers in general;
  2. how to use cloud computing resources and create virtual machines;
  3. how to manage projects using Git and GitHub;
  4. how to create a LAMP server, websites, and create a bare bones OPAC;
  5. how to install and configure content management systems, and;
  6. how to install and configure an integrated library system.

About This Book

The Systems Librarianship course is a brand new course (2023). I created the course to help future and current librarians become proficient in the kind of technology used to manage and provide electronic resources.

Since I use this book for my Systems Librarianship course, which I hope to teach each spring semester, this book will be a live document. Each semester that I teach this course, I will update the content in order to address changes in the technology and to edit for clarity when I discover some aspect of the book causes confusion or does not provide enough information.

A small part of this book will draw from my course on Linux Systems Administration, which I teach in the fall semesters.

This book is not a comprehensive introduction to systems librarianship. For example, this book does not cover software coding nor managerial duties, like issuing requests for proposals for software products, or budgeting. It is designed as an entry level course in the technical aspects of systems librarianship, and it is meant to go hand-in-hand with other courses taught in our program. That includes my course on electronic resource management but also other courses that my colleagues teach.

The book will start off as a series of transcripts, and over time, my hope is to build it out to a full fledged textbook on systems librarianship. I am using mdBook to build this work.

The content in this book is open access and licensed under the GNU GPL v3.0. Feel free to fork it on GitHub and modify it for your own needs.

History of Unix and Linux

An outline of the history of Unix and Linux.

Location: Bell Labs, part of AT&T (New Jersey), late 1960s through early 1970s

  • Starts with an operating system called Multics.
  • Multics was a time sharing system
    • That is, more than one person could use it at once.
  • But Multics had issues and was slowly abandoned
  • Ken Thompson found an old PDP-7. Started to write UNIX.
    • The ed line editor was written.
    • Pronounced e.d. but generally sounded out.
  • This version of UNIX would later be referred to as Research Unix
  • Dennis Ritchie, the creator of the C programming language, joined Thompson's efforts.

Location: Berkeley, CA (University of California, Berkeley), early to mid 1970s

  • The code for UNIX was not 'free software' but low cost and easily shared.
  • Ken Thompson visited Berkeley and helped install Version 6 of UNIX
  • Bill Joy and others contributed heavily
    • Joy created the vi text editor, a descendant of the popular Vim editor, many other important programs, and was a co-founder of Sun Microsystems
  • This installation of UNIX would eventually become known as the Berkeley Software Distribution, or BSD.

AT&T

  • Until its breakup in 1984, AT&T was not allowed to profit off patents that were not directly related to its telecommunications businesses.
  • This agreement with the US government helped protect the company from monopolistic charges, and as a result, they could not commercialize UNIX.
  • This changed after the breakup. System V UNIX became the standard bearer of commercial UNIX.

Location: Boston, MA (MIT), early 1980s through early 1990s

  • In the late 1970s, Richard Stallman noticed that software began to become commercialized.
    • As a result, hardware vendors stopped sharing the code they developed to make their hardware work.
  • Software code became eligible for copyright protection with the Copyright Act of 1976
  • Stallman, who thrived in a hacker culture, began to battle against this turn of events.
  • Stallman created the GNU project, the free software philosophy, GNU Emacs, a popular and important text editor, and he wrote many other programs.
  • The GNU project is an attempt to create a completely free software operating system, that was Unix-like, called GNU.
  • By the early 1990s, Stallman and others had developed all the utilities needed to have a full operating system, except for a kernel, which they called GNU Hurd.
  • This included the Bash shell, written by Brian Fox.
  • The GNU philosophy includes several propositions that define free software:

The four freedoms, per GNU Project: 0. The freedom to run the program as you wish, for any purpose (freedom 0).

  1. The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
  2. The freedom to redistribute copies so you can help others (freedom 2).
  3. The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

The Four Freedoms

The Unix wars and the lawsuit, late 1980s through the early 1990s

  • AT&T, after its breakup, began to commercialize Unix, and differences in AT&T Unix and BSD Unix arose.
  • The former was aimed at commercialization, and the latter aimed at researchers and academics.
  • UNIX Systems Laboratories, Inc. (USL, part of AT&T) sued Berkeley Software Design, Inc. (BSDi, part of the University of California, Berkeley) for copyright and trademark violations.
  • USL ultimately lost the case, but the lawsuit delayed adoption of BSD Unix.

Linux, Linus Torvalds, University of Helsinki, Finland, early 1990s

  • On August 25, 1991, Linus Torvalds announced that he had started working on a free operating system kernel for the 386 CPU architecture and for his specific hardware.
  • This kernel would later be named Linux.
  • Linux technically refers only to the kernel.
    • An operating system kernel handles startup, devices, memory, resources, etc.
    • A kernel does not provide user land utilities---the kinds of software they people use when using computers.
  • Torvalds' motivation was to learn about OS development but also to have access to a Unix-like system.
    • He already had access to an Unix-like system called MINIX, but MINIX had technical and copyright restrictions.
  • Torvalds has stated that if a BSD or if GNU Hurd operating system were available, then he may not have created the Linux kernel.
  • But Torvalds and others took the GNU utilities and created what is now called Linux or GNU/Linux.

Distributions, early 1990s through today

  • Soon after the Linux development, people would create their own Linux and GNU based operating systems and would distribute them.
  • As such, these Linux operating systems became referred to as distributions.
  • The two oldest distributions that are still in active development include:

Short History of BSD, 1970s through today

  • Unix version numbers 1-6 eventually led to BSD 1-4.
  • At BSD 4.3, all versions had some AT&T code.
    • Desire to remove this code led to BSD Net/1.
  • All AT&T code was removed by BSD Net/2.
  • BSD Net/2 was ported to the Intel 386 processor.
    • This became 386BSD and was made available in 1992, a year after the Linux kernel was released.
  • 386BSD split into two projects:
  • NetBSD split into another project: OpenBSD.
  • All three of these BSDs are still in active development.
  • From a bird's eye point of view, they each have different focuses:
    • NetBSD focuses on portability (MacOS, NASA)
    • FreeBSD focuses on wide applicability (WhatsApp, Netflix, PlayStation 4, MacOS)
    • OpenBSD focuses on security (has contributed a number of very important applications)

MacOS is based on Darwin, is technically UNIX, and is partly based on FreeBSD with some code coming from the other BSDs. See Why is macOS often referred to as 'Darwin'? for a short history.

Short History of GNU, 1980s through today

  • The GNU Hurd is still under active development, but it's the pre-production state.
  • The last release was 0.9 on December 2016.
  • A complete OS based on the GNU Hurd can be downloaded and ran. For example: Debian GNU/Hurd

Free and Open Source Licenses

In the free software and open source landscape, there are several important free and/or open source licenses that are used. The two biggest software licenses are based on the software used by GNU/Linux and the software based on the BSDs. They each take very different approaches to free and/or open source software. The biggest difference is this:

  • Software based on software licensed under the GPL must also be licensed under the GPL. This is referred to as copyleft software, and the idea is to propagate free software.
  • Software based on software licensed under the BSD license may be closed source and primarily must only attribute the original source code and author.

What is Linux?

The Linux Kernel

Technically, Linux is a kernel, and a kernel is a part of an operating system that oversees CPU activity like multitasking, as well as networking, memory management, device management, file systems, and more. The kernel alone does not make an operating system. It needs user land applications and programs, the kind we use on a daily basis, to form a whole, as well as ways for these user land utilities to interact with the kernel.

Linux and GNU

The earliest versions of the Linux kernel were combined with tools, utilities, and programs from the GNU project to form a complete operating system, without necessarily a graphical user interface. This association continues to this day. Additional non-GNU, but free and open source programs under different licenses, have been added to form a more functional and user friendly system. However, since the Linux kernel needs user land applications to form an operating system, and since user land applications from GNU cannot work without a kernel, some argue that the operating system should be called GNU/Linux and not just Linux. This has not gained wide acceptance, though. Regardless, credit is due to both camps for their contribution, as well as many others who have made substantial contributions to the operating system.

Linux Uses

We are using Linux as a server in this course, which means we will use Linux to provide various services. Our first focus is to learn to use Linux itself, but by the end of the course, we will also learn how to provide web and database services. Linux can be used to provide other services that we won't cover in this course, such as:

  • file servers
  • mail servers
  • print servers
  • game servers
  • computing servers

Although it's a small overall percentage, many people use Linux as their main desktop/laptop operating system. I belong in this camp. Linux has been my main OS since the early 2000s. While our work on the Linux server means that we will almost entirely work on the command line, this does not mean that my Linux desktop environment is all command line. In fact, there are many graphical user environments, often called desktop environments, available to Linux users. Since I'm currently using the Ubuntu Desktop distribution, my default desktop environment is called Gnome. KDE is another popular desktop environment, but there are many other attractive and useful ones. And it's easy to install and switch between multiple ones on the same OS.

Linux has become quite a pervasive operating system. Linux powers the hundreds of the fastest supercomputers in the world. It, or other Unix-like operating systems, are the foundation of most web servers. The Linux kernel also forms the basis of the Android operating system and of Chrome OS. The only place where Linux does not dominate is in the desktop/laptop space.

What is Systems Administration?

Introduction

What is systems administration or who is a systems administrator (or sysadmin)? Let's start off with some definitions provided by the National Institute of Standards and Technology:

An individual, group, or organization responsible for setting up and maintaining a system or specific system elements, implements approved secure baseline configurations, incorporates secure configuration settings for IT products, and conducts/assists with configuration monitoring activities as needed.

Or:

Individual or group responsible for overseeing the day-to-day operability of a computer system or network. This position normally carries special privileges including access to the protection state and software of a system.

See: Systems Administrator @NIST

Specialized Positions

In addition to the above definitions, which broadly define the role, there are a number of related or specialized positions. We'll touch on the first three in this course:

  • Web server administrator:
    • "web server administrators are system architects responsible for the overall design, implementation, and maintenance of Web servers. They may or may not be responsible for Web content, which is traditionally the responsibility of the Webmaster (Web Server Administrator" @NIST).
  • Database administrator:
    • like web admins, and to paraphrase above, database administrators are system architects responsible for the overall design, implementation, and maintenance of database management systems.
  • Network administrator:
    • "a person who manages a network within an organization. Responsibilities include network security, installing new applications, distributing software upgrades, monitoring daily activity, enforcing licensing agreements, developing a storage management program, and providing for routine backups" (Network Administrator @NIST).
  • Mail server administrator:

Depending on where a system administrator works, they may specialize in any of the above administrative areas, or if they work for a small organization, all of the above duties may be rolled into one position. Some of the positions have evolved quite a bit over the last couple of decades. For example, it wasn't too long ago when organizations would operate their own mail servers, but this has largely been outsourced to third-party providers, such as Google (via Gmail) and Microsoft (via Outlook). People are still needed to work with these third-party email providers, but the nature of the work is different than operating independent mail servers.

Certifications

It's not always necessary to get certified as a systems administrator to get work as one, but there might be cases where it is necessary; for example, in government positions or in large corporations. It also might be the case that you can get work as an entry level systems administrator and then pursue certification with the support of your organization.

Some common starting certifications are:

Plus, Google offers, via Coursera, a beginners Google IT Support Professional Certificate that may be helpful.

Associations

Getting involved in associations and related organizations is a great way to learn and to connect with others in the field. Here are few ways to connect.

LOPSA, or The League of Professional System Administrators, is a non-profit association that seeks to advance the field and membership is free for students.

ACM, or the Association for Computing Machinery, has a number of relevant special interest groups (SIGs) that might be beneficial to systems administrators.

NPA, or the Network Professional Association, is an organization that "supports IT/Network professionals."

Codes of Ethics

Systems administrators manage computer systems that contain a lot of data about us and this raises privacy and competency issues, which is why some have created code of ethics statements. Both LOPSA and NPA have created such statements that are well worth reviewing and discussing.

Keeping Up

Technology changes fast. In fact, even though I teach this course about every year, I need to revise the course each time, sometimes substantially, to reflect changes that have developed over short periods of time. It's also your responsibility, as sysadmins, to keep up, too.

I therefore suggest that you continue your education by reading and practicing. For example, there are lots of books on systems administration. O'Reilly continually publishes on the topic. RedHat, the makers of the Red Hat Linux distribution, and sponsors of Fedora Linux and CentOS Linux, provides the Enable Sysadmin site, with new articles each day, authored by systems administrators, on the field. Opensource.com, also supported by Red Hat, publishes articles on systems administration. Command Line Heroes is a fun and informative podcast on technology and sysadmin related topics. Linux Journal publishes great articles on Linux related topics.

Conclusion

In this section I provided definitions of systems administrators and also the related or more specialized positions, such as database administrator, network administrator, and others.

I provided links to various certifications you might pursue as a systems administrator, and links to associations that might benefit you and your career.

Technology manages so much of our daily lives, and computer systems store lots of data about us. Since systems administrators manage these systems, they hold a great amount of responsibility to protect them and our data. Therefore, I provided links to two code of ethics statements that we will discuss.

It's also important to keep up with the technology, which changes fast. The work of a systems administrator is much different today than it was ten or twenty years ago, and that surely indicates that it could be much different in another ten to twenty years. If we don't keep up, we won't be of much use to the people we serve.

What is Systems Librarianship

Introduction

Of course, let's begin with the question, what is systems librarianship? Normally we might go to the literature to answer a question like this. Indeed, the literature is helpful, but it's sparse. The LISTA database only returns 131 results with a 45 year coverage for a search using the thesauri term SYSTEMS Librarians. I can get more results if I expand the search query, but then I get less relevant results, and the main idea is the same: this is an understudied area of librarianship.

It's been that way for a while. Susan K. Martin wrote the following over 35 years ago:

Of the specialist positions that exist in libraries, none is as underexamined as those of the systems librarians---the people who identify the needs of the library for automated systems, cause these systems to be implemented, and analyze the operations of the library (p. 57).

Perhaps as a result of this underexamination, sometimes there is confusion around the requirements and skills needed in this area of librarianship. Martin (1988) captured this tension when she wrote the following in 1988, which is still true today:

Over the years the library world has argued whether systems librarians should be librarians who have learned information technologies, or computer experts who have learned about libraries (p. 61).

The argument is partly a matter of jurisdiction. Abbott (1998), writing on librarianship in the sociology of professions, illustrated how:

The future of librarianship thus hinges on what happens to the perpetually changing work of the profession in its three contexts: the context of larger social and culture forces, the context of other competing occupations, and the context of competing organizations and commodities. To these complex contextual forces, any profession responds with varying policies and internal changes (pp. 434-5).

Essentially, Abbott means that professions, like librarianship, are always changing. The mechanisms for that change are structural and cultural (Abbott, 2010), but a changing profession means that its "link of jurisdiction" (Abbott, 1998, p. 435) changes, too. It not only changes, but professions constantly compete with each other over to adopt new areas of jurisdiction. So when we ask, as Martin (1998) did, whether librarians should learn information technologies or whether computer experts should learn libraries, I find myself thinking the prior is more important for libraries and their patrons. It means that librarians are expanding their jurisdiction by also becoming computer experts rather than computer experts expanding theirs.

That leads us to the next questions: what does it mean to be a computer expert for a systems librarian? What does a systems librarians need to do and know?

The answer is that it is a mix. Some part of the work involves systems administration, but that has broad meanings, and systems librarianship is more specific. Or, it has a more specific domain: the domain of libraries and librarianship.

A systems librarian might thus be considered a library systems administrator. Under this view, they need to be someone who knows about libraries, how libraries work, what they do, about their patrons, what their values are, and then use that knowledge to build the infrastructure to support that.

Given this, and the technologies involved, such work requires constant learning. Jordan (2003) identified three areas of learning:

  • pre-service education in library schools
  • on the job training
  • professional development in the form of workshops, courses, and conferences (p. 273)

Pre-service, formal education is a small part of any professional's career, regardless if that profession is in medicine, law, or librarianship. Thus the goal of pre-service education is to prepare people to adapt and grow in their fields. Jordan (2003) wrote that:

While formal training is undoubtedly important, the ability to learn new technologies independently lies at the foundation of systems librarians' professional life, because they often have to use technologies, or make planning decisions about specific technologies, before they become common enough to be the subject of formal training sessions (p. 273).

Even though Jordan's article is 20 years old and the technology has changed a lot, the basic duties of the systems librarian remain the same (Fu, 2014; Gonzales, 2020). Wilson (1998), as cited in Jordan (2003), refers to a list of the "typical responsibilities of systems librarians." These responsibilities look different today, because the technology is different, but conceptually, they're the same as they were then. In fact, this work will focus on a subset of this list that includes:

  • integrated library system management
  • server management
  • documentation
  • technology exploration and evaluation (Jordan, 2003, p. 274)

Gonzales (2020) highlights these and more current areas that include:

  • content management systems
  • electronic resource management systems
  • website redesign
  • help and support

Other items on Jordan's (2003) list are still relevant, but due to various constraints, this work will not cover the following areas:

  • network design and management
  • desktop computing
  • application development
  • planning and budget
  • specification and purchasing
  • miscellaneous technology support
  • technical risk management (p. 274)

In short, this work specifically focuses on a few of the bigger technical aspects of systems librarianship. Other works (or courses) and other sources will provide learning opportunities on the more managerial and administrative functions of systems librarianship and librarianship, in general.

If you are interested in learning more about network design and administration, then I encourage you to read my chapters on Networking and TCP/IP and DNS and Domain Names in my book on Systems Administration with Linux.

If you are interested in learning about application development, then you can pursue courses in a variety of programming languages, such as R, Python, JavaScript, and PHP, as well as courses on relational databases, such as MySQL or PostgreSQL, and so forth.

As Jordan (2003) identified, there is a lack of formalized training in systems librarianship in LIS schools. This is as true today as it was in 2003. This course was created to address the lack of that training. However, it can only be a start. Technology is constantly changing, and that means we must always embrace more informal learning opportunities. LIS programs are only two or so years long (if attending full time), but our careers, hopefully, will span decades. So all this course can ever be is just a starting point.

It is a big start, though. This course should lay a strong foundation for self-growth and self-education in the variety of technologies that we will learn and use here. Although separate areas of librarianship, my work (and course) on electronic resource management complement this one in many ways. For example, this work supports several parts of the technology section in the NASIG Core Competencies for Electronic Resources Librarians. It is no coincidence these two areas of librarianship often overlap or are assumed in a single librarian position.

Cloud Computing

Lastly, I want to mention cloud computing. This has become a major area of change in the last decade or so. It used to be more common for librarians to install their integrated library system software and store their bibliographic data on their premises. In the last ten years, there has been more migration to the cloud, which means that both the integrated library system software and the bibliographic data are stored off-site. Liu & Cai (2013) highlight the beginning of this trend toward cloud computing that continues to play a large role in systems librarianship (Naveed et al., 2021). As Liu and Cai note:

Systems librarians used to make their livings by managing hosted library systems. This situation is silently changing with the library systems moving onto the cloud (p. 26).

This trend has changed some aspects of systems librarianship. It means that systems librarians, while still a technical area of librarianship, need to work more closely with the vendors who themselves are hosting library systems. However, the trend does not erase all locally hosted solutions. Many libraries and other information agencies continue to support local collections and will either host those locally or work to get the bibliographic information for those collections ingested into their cloud-based integrated library systems.

Conclusion

The remainder of the course will be more technical and will prepare you to work and understand the systems that support the modern library. We will cover a lot, too! We will begin with setting up virtual machine instances on Google Cloud. We will use a distribution of the Linux operating system for these virtual machines. We will then learn the basics of the Linux command line. Next, we will learn how to use the version control system called git. We will use git to document our work flows and push that documentation to GitHub.com. On our Linux servers, we will create a web server out of what is called a LAMP stack, which stands for Linux, Apache, MySQL, and PHP. We will use the web server to setup a basic website and a bare bones OPAC. (I'll provide the code for this.) Then we will learn how to install and setup two content management systems: Wordpress and Omeka. Lastly, we will spend the final two weeks of the semester installing and setting up the open source Koha ILS.

Let's get started!

References

Abbott, A. (1998). Professionalism and the future of librarianship. Library Trends, 46(3), 430–443. https://www.proquest.com/docview/220452054/abstract/A48FC30B10D94886PQ/1?accountid=11836

Abbott, A. (2010). Varieties of ignorance. The American Sociologist, 41(2), 174–189. https://www.jstor.org/stable/40664150

Gonzales, B. M. (2020). Systems librarianship: A practical guide for librarians. Rowman & Littlefield Publishers. https://rowman.com/ISBN/9781538107133/Systems-Librarianship-A-Practical-Guide-for-Librarians

Fu, P. (2014). Supporting the next-generation ILS: The changing roles of systems librarians. Journal of Library Innovation, 5(1), 30–42.

Jordan, M. (2003). The self‐education of systems librarians. Library Hi Tech, 21(3), 273–279. https://doi.org/10.1108/07378830310494445

Liu, W., & Cai, H. (Heather). (2013). Embracing the shift to cloud computing: Knowledge and skills for systems librarians. OCLC Systems & Services: International Digital Library Perspectives, 29(1), 22–29. https://doi.org/10.1108/10650751311294528

Martin, S. K. (1988). The role of the systems librarian. Journal of Library Administration, 9(4), 57–68. https://doi.org/10.1300/J111v09n04_06

Naveed, M. A., Siddique, N., & Mahmood, K. (2021). Development and validation of core technology competencies for systems librarian. Digital Library Perspectives, 38(2), 189–204. https://doi.org/10.1108/DLP-03-2021-0022

Ratledge, D., & Sproles, C. (2017). An analysis of the changing role of systems librarians. Library Hi Tech, 35(2), 303–311. https://doi.org/10.1108/LHT-08-2016-0092

Wilson, T. C. (1998). Systems librarian: Desinging roles, defining skills. American Library Association. https://www.worldcat.org/title/1038159656

Using Google Cloud (gcloud)

This section introduces us to Google Cloud (gcloud). We will use this platform to create virtual instances of the Ubuntu Server Linux operating system.

Using gcloud for Virtual Machines

Virtual Machines

Our goal in this section is to create a virtual machine (VM) instance. A VM is basically a virtualized operating system that runs on a host operating system. That host operating system may also be Linux, but it could be Windows or macOS. In short, when we use virtual machines, it means instead of installing an operating system (like Linux, macOS, Windows, etc) on a physical machine, we use virtual machine software to mimic the process. The virtual machine, thus, runs on top of our main OS. It's like an app, where the app is a fully functioning operating system.

In my Linux Systems Administration course, we used to use VirtualBox to create virtual machines with Linux as the virtual operating system. This worked despite whether you or I were running Windows, macOS, or Linux as our main operating systems. VirtualBox is freely available virtualization software, and using it let students and myself run Linux as a server on our own desktops and laptops without changing the underlying OS on those machines (e.g., Windows, macOS).

However, even though we virtualize an operating system when we run a VM, the underlying operating system and CPU architecture are still important. When Apple, Inc launched their new M1 (ARM-based) chip in 2020, it created problems for running non ARM-based operating systems as virtual machines (i.e., x86_64 chips).

Fortunately, we are able to solve that issue using a third-party virtualization platform. In this course, that means we're going to use gcloud (via Google). There are other options available that you can explore on your own.

Google Cloud / gcloud

Google Account

We need to have a personal Google account to get started with gcloud. I imagine most of you already have a Google account, but if not, go ahead and create one at https://www.google.com.

Google Cloud (gcloud) Project

Next, the gcloud software helps you us on a Google Cloud project on your own system. Once you've created that project, you can enable billing for that project, and then install the gcloud software on your local machine.

Follow Step 1 at the top of the Install the gcloud CLI page to create a new project. Also, review the page on creating and managing projects.

When you create your project, you can name it anything, but try to name it something to do with this course. E.g., I am using the name syslib-2023. Avoid using spaces when naming your project.

Then click on the Create button, and leave the organization field set to No Organization.

Google Billing

The second thing to do is to set up a billing account for your gcloud project. This does mean there is a cost associated with this product, but the good news is that our bills by the end of the semester should only amount to $5 to 10 dollars, at most. Follow Step 2 to enable billing for your new project. See also the page on how to create, modify, or close your self-serve Cloud Billing account

Install the latest gcloud CLI version

After you have set up billing, the next step is to install gcloud on your local machines. The Install the gcloud CLI page provides instructions for different operating systems.

There are installation instructions for macOS, Windows, Chromebooks, and various Linux distributions. Follow these instructions closely for the operating system that you're using. Note that for macOS, you have to choose among three different CPU/chip architectures. If you have an older macOS machine (before November 2020 or so), it's likely that you'll select macOS 64-bit (x86_64). If you have a newer macOS machine, then it's likely you'll have to select macOS 64-bit (arm64, Apple M1 silicon). It's unlikely that any of you are using a 32-bit macOS operating system. If you're not sure which macOS system you have, then let me know and I can help you determine the appropriate platform. Alternatively, follow these instructions to find your processor information:

  • click on the Apple menu
  • choose About This Mac
  • locate the Processor or Chip information

After you have downloaded the gcloud CLI for your particular OS and CPU architecture, you will need to open a command prompt/terminal on your machines to complete the instructions that describe how to install the gcloud CLI. macOS uses the Terminal app, which can located using Spotlight. Windows user can use Command.exe, which can be located by search also.

Windows users will download a regular .exe file, but macOS users will download a .tar.gz file. Since macOS is Unix, you can use the mv command to move that file to your $HOME directory. Then you extract it there using the tar command, and once extracted you can change to the directory that it creates with the cd command. For example, if you are downloading the X86_64 version of the gcloud CLI, then you would run the following commands:

For macOS users, this assumes the .tar.gz file was downloaded to your default Downloads folder:

cd ~/Downloads/
mv google-cloud-cli-392.0.0-darwin-x86_64.tar.gz ~/
cd ~/
tar -xzf google-cloud-cli-392.0.0-darwin-x86_64.tar.gz
cd google-cloud-sdk

Modify the above commands, as appropriate, if you're using the M1 or the M2 version of the gcloud CLI.

Initializing the gcloud CLI

Please follow the instructions from the Google Cloud documentation for your operating system.

Once you have downloaded and installed the gcloud CLI program, you need to initialize it on your local machine. Scroll down on the install page to the section titled Initializing the gcloud CLI. In your terminal/command prompt, run the initialization command, per the instructions at the above page:

gcloud init

And continue to follow the above instructions.

gcloud VM Instance

Once you've initialized gcloud, log into Google Cloud Console, which should take you to the Dashboard page.

Our first goal is to create a virtual machine (VM) instance. As a reminder, a VM is basically a virtualized operating system. That means instead of installing an operating system (like Linux, macOS, Windows, etc) on a physical machine, software is used to mimic the process.

gcloud offers a number of Linux-based operating systems to create VMs. We're going to use the Ubuntu operating system and specifically the Ubuntu 20.04 LTS version.

Ubuntu is a Linux distribution. There are many, many distributions of Linux, and most are probably listed on the DistroWatch site. A new version of Ubuntu is released every six months. The 20.04 signifies that this is the April 2020 version. LTS signifies Long Term Support. LTS versions are released every two years, and Canonical LTD, the owners of Ubuntu, provide standard support for LTS versions for five years.

LTS versions of Ubuntu are stable. Non-LTS versions of Ubuntu receive nine months of standard support, and generally apply cutting edge technology, which is not always desirable for server operating systems. Each version of Ubuntu has a code name. 20.04 has the code name Focal Fossa. You can see a list of versions, code names, release dates, and more on Ubuntu's Releases page.

We will create our VM using the gcloud console. To do so, follow these steps from the Project page:

  • Click on the hamburger icon (three vertical bars) in the top right corner.
  • Click on Compute Engine and then VM instances
  • Make sure your project is listed.
  • Next, click on Create Instance.
  • Provide a name for your instance.
    • E.g., I chose syslib-2023 (no spaces)
  • Under the Series drop down box, make sure E2 is selected.
  • Under the Machine type drop down box, select e2-micro (2 vCPU, 1 GB memory)
    • This is the lowest cost virtual machine and perfect for our needs.
  • Under Boot disk, click on the Change button.
  • In the window, select Ubuntu from the Operating system drop down box.
  • Select Ubuntu 20.04 LTS x86/64
  • Leave Boot disk type be set to Balanced persistent disk
  • Disk size should be set to 10 GB.
  • Click on the Select button.
  • Check the Allow HTTP Traffic button
  • Finally, click on the Create button to create your VM instance.

Later in the semester when we install Koha, we will need to create a virtual machine with more CPUs and memory. We will be charged more for those machines. Since we do not yet need the extra resources, we will start off with fairly low powered machines.

Connect to our VM

After the new VM machine has been created, we need to connect to it via the command line. macOS users will connect to it via their Terminal.app. Windows users can connect to it via their command prompt.

We use a ssh command to connect to our VMs. The syntax follows this pattern:

gcloud compute ssh --zone "zone-info" "name-info" --project "project-id"

The values in the double quotes in the above command can be located in your Google Cloud console and in your VM instances section. See the course video for details.

Update our Ubuntu VM

The VM will include a recently updated version of Ubuntu 20.04, but it may not be completely updated. Thus the first thing we need to do is update our machines. On Ubuntu, we'll use the following two commands, which you should run also:

sudo apt update
sudo apt -y upgrade

Then type exit to logout and quit the connection to the remote server.

exit

When you log into your machines, you'll note a command prompt that ends with a dollar sign $. This is where we type our commands. The command prompt also displays our location in the file system. The tilde ~ is a shorthand symbol for our home directory. By default, we are placed in our home directory whenever we login to our machines.

Snapshots

Lastly, we have installed a pristine version of Ubuntu, but it's likely that we will mess something up as we work on our systems. Or it could be that our systems may become compromised at some point. Therefore, we want to create a snapshot of our newly installed Ubuntu server. This will allow us to restore our server if something goes wrong later.

To get started:

  1. In the left hand navigation panel, click on Snapshots.

  2. At the top of the page, click on Create Snapshot.

  3. Provide a name for your snapshot: e.g., ubuntu-1.

  4. Provide a description of your snapshot: e.g.,

    This is a new install of Ubuntu 20.04.

  5. Choose your Source disk.

  6. Choose a Location to store your snapshot.

    • To avoid extra charges, choose Regional.
    • From the drop down box, select the same location (zone-info) your VM has
  7. Click on Create

Please monitor your billing for this to avoid costs that you do not want to incur.

Conclusion

Congratulations! You have just completed your first installation of a Linux server.

To summarize, in this section, you learned about and created a VM with gcloud. This is a lot! After this course is completed, you will be able to fire up a virtual machine on short notice and deploy websites and more.

Learning the Command Line

It's obviously more common for people today to learn how to use a computer via a graphical user interface (GUI), but there are benefits to learning a command line interface (CLI). In this section, we learn some of the basics of using the Bash shell as our CLI. Our primary goal is to learn how to use the CLI as a file manager and to perform some text editing. However, if you find this interface appealing, know that Bash is a full-fledged programming language, and I encourage you to explore it as a scripting language.

There are three reasons, from a systems administration point of view, to prefer the CLI over the GUI. First, the GUI entails extra software, and the more software we have on a server, the more resources (memory, CPU, storage, etc) that software consumes. We would much rather have our machine's resources being used to provide the services we build them to do than to run irrelevant software. Second, the extra software a GUI requires means that we expose our systems to additional security risks. That is, every time we install more software on our servers, the server becomes more vulnerable because all software has bugs. This means that we want to be conservative, careful, and protective of our systems. This is especially true for production systems. Third, graphical user interfaces do not provide a good platform for automation, at least not remotely as well as command line interfaces do. Working on the command line, becuase it is a text-based environment, in what is known as a shell, is a reproducible process. That is not as easily true in a GUI.

Fortunately, Linux, and many other Unix-like operating systems, have the ability to operate without graphical user interfaces. This is partly the reason why these operating systems have done so well in the server market.

In this section, our focus is learning the command line environment. We will do this using the Bash shell. We will learn how to use the shell, how to navigate around the filesystem, how to perform basic tasks, and explore other functions and utilities the shell has to offer.

Learn the Command Line Interface (CLI)

Introduction

There are two major interfaces that we use to interact with our computers. The most common interface is the graphical user interface, or GUI. This interface largely emphasizes non-textual interaction, such as the mouse, fingers (touch screens), remote controls (e.g., smart TVs), and most recently, wearable tech such as VR headsets and like. All of the above mechanisms for interacting with our computer systems are worthwhile, but more importantly, they are all suited to specific ranges of engagement with our computers. That is, they afford certain kinds of actions (Dourish, 2001).

The other major way of interfacing with our computers is via the command line interface, or CLI. The CLI is also suited to specific ranges of engagement, and its the kind of engagement that often allows us greater control over our systems.

One reason the CLI provides greater control over our systems is because the interaction is all text-based. Text-based interaction requires more specificity than graphical-based interaction. By that I mean, it requires us to provide written instructions to a computer and to know what instructions to give it when we want the computer to perform some specific action. This means that we have to memorize some common instructions in order to use our systems. This is not necessarily difficult because many of the most common instructions, or commands, are mnemonic, but it does take some getting used to.

A second reason the CLI provides greater control over the system is that because it's text-based, it can be automated. We will not cover programming in this work or course, but know that all the commands that we will learn can be put in a text file, made into an executable file, and run like a program. This makes text-based interaction rather powerful.

Basic Commands

In light of that, I have developed two programs that will help you remember these basic commands. The commands that I'll ask you to learn encompass less than 0.3% of the commands that are available on a Linux system, but they are the most commonly used commands. Many of the other commands that are available are for very specific purposes. I'd estimate that despite having used the Linux command line for over 20 years, I've barely used 20% of them, and I might be stretching my estimate.

The first set of commands that I'll ask you to learn and practice include the following:

list files and directories.................. ls
print name of current/working directory..... pwd
create a new directory...................... mkdir
remove or delete an empty directory......... rmdir
change directory............................ cd
create an empty file........................ touch
print characters to output.................. echo
display contents of a text file............. cat
copy a file or directory.................... cp
move or rename a file or directory.......... mv
remove or delete a file or directory........ rm

You will practice these commands using the program that I wrote called learn-the-cli (I will show you how to install this and the other programs shortly).

I also developed a flashcards program that will help you learn an additional fifteen commands. This program is based on one created by someone else for a different purpose (see source code link above for credit). I'll explain these additional commands as we proceed through the semester. In the meantime, I'll ask that you periodically run the flashcards program to familiarize yourself with these commands, which includes the ones in the list above but also a few additional ones.

The Filesystem

In addition to the various commands that I'll ask you to learn, you will also have to learn the structure of the Linux filesystem. A filesystem has several meanings, but in this context, I refer to where the directories on the Linux system are placed. I find this to be the most difficult thing that new Linux users have to learn for a couple of reasons. First, modern operating systems tend to hide the filesystem from their users. So even though, for example, macOS is Unix, many macOS users that I have taught are completely unfamiliar with the layout of directories on their system. This is because, per my observations, macOS Finder does not show the filesytem by default these days. Instead it shows its users some common locations for folders. This might make macOS more usable to most users, but it makes learning the system more difficult.

What's common for both macOS and Linux operating systems is a filesytem based on a tree-like structure. These filesystems begin at what's called a root location. The root location is referenced by a forward slash: /. All directories branch off from root. The location to any directory is called a PATH. For example, our home directories on Linux will be located at the following PATH:

/home

That PATH begins at root / and ends at home.

It is a little different for Windows users. Since Windows is not Unix-like, it uses a different filesystem hierarchy. Many Windows users might be familiar with the basics, such as the C: drive for the main storage device or the D: drive for an added USB stick. As such, the Windows operating system uses multiple root directories (C:, D:, E:, etc.) I encourage you to read the following article on A quick introduction to the Linux filesystem for Windows users. The article is published by Red Hat, which makes its own Linux distribution.

In short, learning the Linux filesystem requires adopting a new mental model about how the operating system organizes its directories and files. Like learning the basic commands, it's not too hard, but it may take time and practice before it sticks. To help learn it, I wrote an additional program that will let you practice navigating around the Linux filesystem and making some changes to it. The program is called learn-the-filesystem. Before you use this program, I would like to encourage you to read another Red Hat article on Navigating your filesystem in the Linux terminal. It includes sections that my program will cover that include:

  • viewing file lists
  • opening a folder (aka, a directory)
  • closing a folder
  • navigating directories
  • absolute paths

Bash: The Bourne Again Shell

I should point out that the command line interface that we are using on our Linux servers is provided by a shell. A shell is "both an interactive command language and a scripting language" (see link above). We will use the shell strictly as a command language, but if you're interested someday, I'd encourage you to explore Bash as a scripting language (I personally script in Bash quite a lot). There are a variety of shells available for Linux and other Unix-like operating systems, but the most popular one and the one we will be using is called Bash.

Bash is an acronym for the Bourne Again Shell because it's based on the original Unix shell called the Bourne shell, written by Stephen Bourne. Bash itself was written by Brian Fox.

I think it's important to know the history of the technologies that we use, and Bash has a super interesting history that pre-exists Linux. Therefore, I highly encourage you listen to the Command Line Heroes episode titled Heroes in a Bash Shell, narrated by Saron Yitbarek. The episode recounts Brian Fox's history with the Bash shell while he worked for the Free Software Foundation in the 1980s.

Conclusion

We will spend the next few weeks practicing these commands and learning the filesystem. We'll do this because knowing these things is integral to accomplishing everything else in this work, including installing and setting up our content management systems and the integrated library system.

In the video for this week, I'll show you how to install the three programs that I wrote or modified. We will use git to download them. The we will move the programs to a specific directory in our executable PATH. This will allow us to run them simply by typing their names.

Installation

To install my practice programs, login to your Linux virtual instances, and run the following commands. You will learn more about these commands shortly.

First, let's take a look at the contents of your home directory (the default directory you're in when you connect to your virtual machine):

ls

Most likely, nothing will be listed.

Now let's retrieve the programs using the git command:

git clone https://github.com/cseanburns/learn-the-commandline.git

Run the ls command again, and you'll see a new directory called learn-the-commandline:

ls

Next, copy the programs to an executable path:

sudo cp learn-the-commandline/* /usr/local/bin

Run the first program and work through it in order to learn some of the basic commands:

learn-the-cli

When ready, run the second program in order to learn about the Linux filesystem:

learn-the-filesystem

Finally, periodically run the flashcards program to refresh your memory of the basic commands, plus some other commands that you'll learn about soon:

flashcards

References

Dourish, P. (2001). Where the Action Is: The Foundations of Embodied Interaction. MIT Press. https://doi.org/10.7551/mitpress/7221.001.0001

Text editors

As we learn more about how to work on the command line, we will acquire the need to write in plain text or edit configuration files. Most configuration files for Linux applications exist in the /etc directory, and are regular text files. For example, later in the semester we will install the Apache Web Server, and we will need to edit Apache's configuration files in the process.

In order to edit and save text files, we need a text editor. Programmers use text editors to write programs, but because programmers often work in graphical user environments, they may often use graphical text editors or graphical Integrated Development Environments (IDEs). It might be that if you work in systems librarianship, that you will often use a graphical text editor, but knowing something about how to use command line-based editors can be helpful.

What is a Plain Text?

Plain text is the most basic way to store human-readable textual information. Whenever we use a word processor program, like Microsoft Office, we are creating a complex series of files that instruct the Office application how to display the contents of the file as well as how the contents are formatted and arranged. This can easily be illustrated by using an archive manager to extract the contents of a .docx file. Upon examination, most of the files in a single .docx file are plain text that are marked up in XML. The files are packaged as a .docx file and then rendered by an application, commonly Microsoft Word, but any application that can read .docx files will do.

A plain text file only contains plain text. Its only arrangement is from top to bottom. It does not allow for any kind of additional formatting, and it does not include media. It is the closest thing the digital has to output produced by a typewriter, but a typewriter that's connected to the internet.

A lot of content is written in plain text. For example, HTML is written in plain text and the web browser uses the HTML markup to render how a page will look.

<p>This is using a HTML paragraph tag.
The web browser would normally render this like
the other paragraphs on this page.
However, it's written in a code block,
which allows us to display the HTML tags
and appear as if it's real source code.</p>

The rendered result is not plain text but HTML, just like the rendered result of all those XML files in a .docx file are not plain text but a .docx file. Softare is written in plain text files because programming languages cannot evaluate content that is not just text. Those of you who have learned how to use the R programming language wrote your R code in plain text likely using the RStudio IDE. For our purposes, we need plain text files to modify configuration files for the various programs that we will install later.

Why Edit in Plain Text

Most of the time when we configure software, we might do it, for example, by using our mouse to find the settings menu in some application that we are using. All that does, for the most part, is make changes to some text file somewhere. We will have to be more direct since we are working on the command line only. That is, the kind of settings configurations we will do will require editing a variety of plain text files that the programs will use to modify how they work. Often the settings for programs can only be modified by editing their plain text configuration files.

We will also soon be working with Git and GitHub. These will also require us to use plain text. More on that in a couple of chapters.

nano

The nano text editor is one of the user-friendliest of the text editors available on the Linux command line, but it still requires some adjustment as a new command line user. The friendliest thing about nano is that it is modeless, which is what you're already accustomed to using. This means nano can be used to enter and manipulate text without changing to insert or command mode. It is also friendly because, like many graphical text editors and software, it uses control keys to perform its operations.

A modal text editor has modes such as insert mode or command mode. In insert mode, the user types text as anyone would in any kind of editor or word processor. The user switches to command mode to perform operations on the text, such as find and replace, saving, cutting and pasting but cannot insert text as they would in insert mode. Switching between modes usually involves pressing some specific keys. In Vim and ed(1), my text editors of choice, the user starts in command mode and switches to insert mode by pressing the letter i or the letter a. The user may switch back to command mode by pressing the Esc key in Vim or by pressing the period in a new line in ed(1).

The tricky part to learning nano is that the control keys are assigned to different keystroke combinations than what many graphical editors (or word processors) use by convention today. For example, instead of Ctrl-c or Cmd-c to copy text, in nano you press the M-6 key (press Alt, Cmd, or Esc key and 6) to copy. Then to paste, you press Ctrl-u instead of the more common Ctrl-v. Fortunately, nano lists the shortcuts at the bottom of the screen.

The shortcuts listed need some explanation, though. The carat mark is shorthand for the keyboard's Control (Ctrl) key. Therefore to Save As a file, we write out the file by pressing Ctrl-o (although Ctrl-s will work, too). The M- key is also important, and depending on your keyboard configuration, it may correspond to your Alt, Cmd, or Esc keys. To search for text, you press ^W, If your goal is to copy, then press M-6 to copy a line. Move to where you want to paste the text, and press Ctrl-u to paste.

We can start nano simply by typing nano on the command line. This will open a new, unsaved file with no content. Alternatively, we can start nano by specifying a file name after typing nano. For example, if I want to open a file called example.txt, then I type the following command:

nano example.txt

If the file doesn't exist, this will create it. If it does exit, then the above command will open it.

One of the other tricky things about nano is that the menu bar (really just a crib sheet, so to speak) is at the bottom of the screen instead of at the top, which is where we are mostly accustomed to finding it these days. Also, the nano program does not provide pop up dialog boxes. Instead, all messages from nano, like what to name a file when we save it, appear at the bottom of the screen.

Lastly, nano also uses distinct terminology for some of its functions. The most important function to remember is the Write Out function, which means to save.

For the purposes of this class, that's all you really need to know about nano. Use it and get comfortable writing in it. Some quick tips:

  1. nano file.txt will open and display the file named file.txt.
  2. nano by itself will open to an empty page.
  3. Save a file by pressing Ctrl-o.
  4. Quit and save by pressing Ctrl-x.
  5. Be sure to follow the prompts at the bottom of the screen.

Conclusion

In the prior lesson, we learned how to use the Bash interactive shell. We will continue to do that, but in the meantime, in this lesson, we begin to learn how to use a command line text editor, nano. We will use nano to edit configuration files and publish text to GitHub.

Searching with grep

We have available some powerful utilities and programs to process, manipulate, and analyze text files. In this section, we will focus on the grep utility, which offers some advanced methods for searching the contents of text files.

Grep

The grep command is one of my most often used commands. Basically, grep "prints lines that match patterns" (see man grep). In other words, it's search, and it's super powerful.

grep works line by line. So when we use it to search a file for a string of text, it will return the whole line that matches the string. This line by line idea is part of the history of Unix-like operating systems, and it's important to remember that most utilities and programs that we use on the commandline are line oriented.

"A string is any series of characters that are interpreted literally by a script. For example, 'hello world' and 'LKJH019283' are both examples of strings." -- Computer Hope. More generally, it's the literal characters that we type. It's data.

To visualize how grep works, let's consider a file called operating-systems.csv with content as seen below:

OS, License, Year
Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

We can use grep to search for anything in that file. Let's start with a search for the string Chrome. Notice that even though the string Chrome only appears once, and in one part of a line, grep returns the entire line.

Command:

grep "Chrome" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

Be aware that, by default, grep is case-sensitive, which means a search for the string chrome, with a lower case c, would return no results. Fortunately, grep has an -i option, which means to ignore the case of the search string. In the following examples, grep returns nothing in the first search since we do not capitalize the string chrome. However, adding the -i option results in success:

Command:

grep "chrome" operating-systems.csv

Output:

None.

Command:

grep -i "chrome" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

We can also search for lines that do not match our string using the -v option. We can combine that with the -i option to ignore the string's case. Therefore, in the following example, all lines that do not contain the string chrome are returned:

Command:

grep -vi "chrome" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Sometimes data files, like spreadsheets, contain header columns in the first row. We can use grep to remove the first line of a file by inverting our search and select all lines not matching "OS" at the start of a line. Here the carat key ^ is a regex indicating the start of a line. Again, this grep command returns all lines that do not match the string os at the start of a line, ignoring case:

Command:

grep -vi "^os" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

Alternatively, since we know that the string Year comes at the end of the first line, we can use grep to invert search for that. Here the dollar sign key $ is a regex indicating the end of a line. Like the above, this grep command returns all lines that do not match the string year at the end of a line, ignoring case. The result, in this specific instance, is exactly the same as the last command:

Command:

grep -vi "year$" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
iOS, Proprietary, 2007
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

The man grep page lists other options, but a couple of other good ones include:

Get a count of the matching lines with the -c option:

Command:

grep -ic "proprietary" operating-systems.csv

Output:

4

Print only the match and not the whole line with the -o option:

Command:

grep -io "proprietary" operating-systems.csv

Output:

Proprietary
Proprietary
Proprietary
Proprietary

We can simulate a Boolean OR search, and print lines matching one or both strings using the -E option. We separate the strings with a vertical bar |. This is similar to a Boolean OR search since there's at least one match in the following string, there is at least one result.

Here is an example where only one string returns a true value:

Command:

grep -Ei "(bsd|atari)" operating-systems.csv

Output:

FreeBSD, BSD, 1993

Here's an example where both strings evaluate to true:

Command:

grep -Ei "(bsd|gpl)" operating-systems.csv

Output:

FreeBSD, BSD, 1993
Linux, GPL, 1991

By default, grep will return results where the string appears within a larger word, like OS in macOS.

Command:

grep -i "os" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009
iOS, Proprietary, 2007
macOS, Proprietary, 2001

However, we might want to limit results so that we only return results where OS is a complete word. To do that, we can surround the string with special characters:

Command:

grep -i "\<os\>" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009

Sometimes I find it hard to remember the backslash and angle bracket combinations because they're too much alike HTML syntax but not exactly like HTML syntax. Fortunately, grep has a -w option to match whole words:

Command:

grep -wi "os" operating-systems.csv

Output:

OS, License, Year
Chrome OS, Proprietary, 2009

Sometimes we want the context for a result; that is, we might want to print lines that surround our matches. For example, print the matching line plus the two lines after the matching line using the -A NUM option:

Command:

grep -i "linux" -A2 operating-systems.csv

Output:

Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993

Or, print the matching line plus the two lines before the matching line using the -B NUM option:

Command

grep -i "linux" -B2 operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

We can combine many of the variations. Here I search for the whole word BSD, case insensitive, and print the line before and the line after the match:

Command:

grep -iw -C1 "bsd" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991

We can use another option to stop returning results after some number of hits. Here I use grep to return search for the string "proprietary" and stop after the first hit:

Command:

grep -i -m1 "proprietary" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009

We can add the -n option to instruct grep to tell us what line number for each hit. Below we see that the string "proprietary" is found on lines 2, 5, and 6.

Command:

grep -in "proprietary" operating-systems.csv

Output:

2:Chrome OS, Proprietary, 2009
5:macOS, Proprietary, 2001
6:Windows NT, Proprietary, 1993

We can use grep to search for patterns in strings instead of literal words. Here we use what's called character classes and repetition to search for five letter words:

Command:

grep -Eiw "[a-z]{5}" operating-systems.csv

Output:

Linux, GPL, 1991
macOS, Proprietary, 2001

Or four letter numbers:

Command:

grep -Eiw "[0-9]{4}" operating-systems.csv

Output:

Chrome OS, Proprietary, 2009
FreeBSD, BSD, 1993
Linux, GPL, 1991
macOS, Proprietary, 2001
Windows NT, Proprietary, 1993
Android, Apache, 2008

grep can also search for words that begin with some letter and end with some letter and with a specified number of letters between. Here we search for words that start with m, end with s, and have three letters in the middle:

Command:

grep -Eiw "m.{3}s" operating-systems.csv

Output:

macOS, Proprietary, 2001

Practice

Here let's practice looking at the auth.log file. This file records all attempts to login to the system:

First, we change directory to /var/log.

Second, we use less to peruse the auth.log file.

Third, we do a simple grep search for the string invalid user and pipe that through another grep command that examines IP addresses.

Fourth, we do another simple search for a longer string and pipe that through other commands to sort the data.

cd /var/log
less auth.log
grep -E "session opened for user (sean|root)" auth.log | less
grep "invalid user" auth.log | grep -Eo "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | sort | uniq -c | sort
grep "Connection closed by invalid user" auth.log | cut -d" " -f11 | sort | uniq -c | sort |less
grep "Connection closed by invalid user" auth.log | cut -d" " -f11 | sort | uniq -c | sort -r |less

Conclusion

grep is very powerful, and there are more options listed in its man page.

Note that I enclose my search strings in double quotes. For example: grep "search string" filename.txt It's not always required to enclose a search string in double quotes, but it's good practice because if your string contains more than one word or empty spaces, the search will fail.

The Linux (and other Unix-like OSes) command line offers a lot of utilities to examine data. It's fun to learn and practice these. Despite this, you do not have to become an advanced grep user. For most cases, simple grep searches work well.

If you want to learn more, there are many grep tutorials on the web.

Install the Koha ILS

Preliminary notes on setting up Koha ILS on Google Cloud.

Helpful documentation and demos:

Pre-setup

apt-get update
apt-get upgrade
apt-get autoremove -y 
apt-get install gnupg2

Prep Koha

Add Koha to repos:

echo 'deb http://debian.koha-community.org/koha stable main' | sudo tee /etc/apt/sources.list.d/koha.list

Use the first one but listing the second just in case:

wget -qO - https://debian.koha-community.org/koha/gpg.asc | gpg --dearmor -o /usr/share/keyrings/koha-keyring.gpg
#wget -q -O- https://debian.koha-community.org/koha/gpg.asc | sudo apt-key add -

Install Koha:

apt-get update
apt-get install koha-common
nano /etc/koha/koha-sites.conf
# add:
# INTRAPORT="8080"
sudo apt-get install mysql-server
mysqladmin -u root password bibliolib1
a2enmod rewrite
a2enmod cgi 
systemctl restart apache2
koha-create --create-db bibliolib
nano /etc/apache2/ports.conf 
# add:
# Listen: 8080
systemctl restart apache2
a2dissite 000-default
a2enmod deflate
a2ensite bibliolib
systemctl reload apache2
systemctl restart apache2

Get username and password

nano /etc/koha/sites/bibliolib/koha-conf.xml

Run the web installer at:

Be sure to follow instructions.

http://IP-ADDRESS:8080