Bash stuff I recently encountered

20 Jun 2015, by Pang Yan Han

Last week, I’ve been working on my provision-ubuntu repository and getting it in a better shape to put in on Github.

Part of what I wanted to do was to create a Bash script for starting up everything and for some reason, instead of using an Ansible module to do it, I decided to use a Bash script to:

Add the PPA for git, provided that it has not already been added.

I broke this down into 2 main steps:

  1. Getting the list of added PPAs
  2. Checking if the git PPA is inside the list of PPAs. If it is not, we add it.

First order of business - Get the list of added PPAs

Some googling led to this very nice answer by stwissel.

I’ll copy and paste the relevant code snippet here:

#!/bin/bash
# listppa Script to get all the PPA installed on a system ready to share for reininstall
for APT in `find /etc/apt/ -name \*.list`; do
    grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT | while read ENTRY ; do
        USER=`echo $ENTRY | cut -d/ -f4`
        PPA=`echo $ENTRY | cut -d/ -f5`
        echo sudo apt-add-repository ppa:$USER/$PPA
    done
done

which looks rather initmidating initially. Let’s break it down part by part.

for APT in `find /etc/apt/ -name \*.list`; do

Uses the find command to look for any file with the extension .list under the /etc/apt/ directory. The for loop then iterates over the filenames, storing the filename at each iteration in the APT variable.

Each of the .list files in the /etc/apt/ directory contains lines similar to this:

deb http://sg.archive.ubuntu.com/ubuntu/ trusty universe

You can read more about the format of the lines here. It’s amazing what a Google Search yields.

Moving on, this line:

grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT

Will look for the pattern ^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+ in the file whose name is stored inside the APT variable, and the -o flag for grep ensures that for each matching line, only the part of the line which conforms to the pattern will be printed.

If my description is confusing, fire up a text editor and enter the following line of text into a file named abc.txt:

Hi, my name is John.

Then, run the following 2 commands and observe the difference in their output:

grep 'my name' abc.txt
grep -o 'my name' abc.txt

For grep 'my name' abc.txt, the entire line Hi, my name is John. will be output. However, for grep -o 'my name' abc.txt, only the my name part will be output.

Now, this:

grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT

will result in a list of 0 or more lines being returned. Ok, I have to admit that I actually don’t know what’s the concrete data structure being returned by Bash. I’m just using list for simplification purposes.

Each of those lines, which is of the format deb http://ppa.launchpad.net/PARTONE/PARTTWO (with PARTONE and PARTTWO being two strings that match the [a-z0-9\-] regex) is fed to this loop:

while read ENTRY ; do
    USER=`echo $ENTRY | cut -d/ -f4`
    PPA=`echo $ENTRY | cut -d/ -f5`
    echo sudo apt-add-repository ppa:$USER/$PPA
end

which processes each of those matching lines, with the currently processed line stored in the ENTRY variable. Now, each line in entry is of this format:

deb http://ppa.launchpad.net/PARTONE/PARTTWO

And this code:

USER=`echo $ENTRY | cut -d/ -f4`
PPA=`echo $ENTRY | cut -d/ -f5`

will use the cut command to process each line, treating each line as delimited by the / character, extracting the 4th and 5th fields (field counts start from 1) into the USER and PPA variables respectively. Using the string deb http://ppa.launchpad.net/PARTONE/PARTTWO as an example, the USER variable will contain the string PARTONE and the PPA variable will contain the string PARTTWO.

And finally, this line:

echo sudo apt-add-repository ppa:$USER/$PPA

Echoes out sudo apt-add-repository ppa:$USER/$PPA depending on the values of the USER and PPA variables on each iteration of the while loop. Apparently, the intention of the person who asked the question (I feel very tempted to use ‘asker’ here) was to run the Bash script on an existing system and get it to output a list of sudo apt-add-repository commands that when run on a brand new system, will add all the PPAs that have been added on the existing system. However, that is not my intention so some modifications is in place. That said, we’ve solved our first step: Getting the list of added PPAs.

Storing the list of PPAs in some data structure

I immediately thought of using arrays since I recalled reading them somewhere in the past when I had to use Bash for larger scripts. So I needed to know how to append stuff to an array in Bash. A Google Search yields this answer on Stackoverflow, the relevant code snippet being:

ARRAY=()
ARRAY+=('foo')
ARRAY+=('bar')

Ok, so I modified the first code snippet to:

#!/bin/bash

PPA_ARRAY=()

for APT in `find /etc/apt/ -name \*.list`; do
    grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT | while read ENTRY ; do
        USER=`echo $ENTRY | cut -d/ -f4`
        PPA=`echo $ENTRY | cut -d/ -f5`
        PPA_ARRAY+=("$USER/$PPA")
    done
done

Checking if some element is in an array

Our next order of business is to figure out if the git PPA has been added. So I needed to know how to determine if an element is present in a Bash array. A Google Search yields this answer on Stackoverflow a function to do this:

containsElement () {
  local e
  for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done
  return 1
}

along with sample code:

array=("something to search for" "a string" "test2000")
containsElement "a string" "${array[@]}"                   # echo $? returns 0
containsElement "blaha" "${array[@]}"                      # echo $? returns 1

Hmm, ok. So the latest code snippet becomes:

#!/bin/bash

containsElement () {
  local e
  for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done
  return 1
}

PPA_ARRAY=()

for APT in `find /etc/apt/ -name \*.list`; do
    grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT | while read ENTRY ; do
        USER=`echo $ENTRY | cut -d/ -f4`
        PPA=`echo $ENTRY | cut -d/ -f5`
        PPA_ARRAY+=("$USER/$PPA")
    done
done

if ! containsElement "git-core/ppa" "${PPA_ARRAY[@]}"
then
    sudo add-apt-repository ppa:git-core/ppa
    sudo apt-get update
fi

So it should work, right? No. It got overzealous and just kept on adding the git PPA even after it was added.

Debugging

So some debugging is needed. How do I print a Bash array? A Google Search yields this answer on Stackoverflow:

printf '%s\n' "${my_array[@]}"

Ok. So I modified the code to include these debugging printfs, as well as commenting out some stuff:

#!/bin/bash

containsElement () {
  local e
  for e in "${@:2}"; do [[ "$e" == "$1" ]] && return 0; done
  return 1
}

PPA_ARRAY=()

for APT in `find /etc/apt/ -name \*.list`; do
    grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT | while read ENTRY ; do
        USER=`echo $ENTRY | cut -d/ -f4`
        PPA=`echo $ENTRY | cut -d/ -f5`
        PPA_ARRAY+=("$USER/$PPA")
        printf '%s\n' "${PPA_ARRAY[@]}"
    done
done

if ! containsElement "git-core/ppa" "${PPA_ARRAY[@]}"
then
#    sudo add-apt-repository ppa:git-core/ppa
#    sudo apt-get update
    echo "adding git-core/ppa"
fi

Ok, I expected the size of the bash array to increase by 1 after each iteration. What I saw instead was, just single lines being printed. It was as if the PPA_ARRAY was being reset to empty at the start of each iteration of the while loop. So I added another printf statement after the for loop and… there was nothing. It just printed an empty line.

At this point, I was just… shocked. Wtf was happening?

The problem and the solution

I forgot what I googled, but I found this answer by ruakh.

It turns out that, this part of the code:

grep -o "^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+" $APT | while read ENTRY ; do
    USER=`echo $ENTRY | cut -d/ -f4`
    PPA=`echo $ENTRY | cut -d/ -f5`
    PPA_ARRAY+=("$USER/$PPA")
    printf '%s\n' "${PPA_ARRAY[@]}"
done

causes the while loop to receive a new copy of the shell’s execution environment, including the PPA_ARRAY variable. That explains everything.

The solution then, is to use process substitution instead, like this:

for APT_LIST in `find /etc/apt/ -name \*.list`; do
  while read ENTRY; do
    USER=`echo $ENTRY | cut -d/ -f4`
    PPA=`echo $ENTRY | cut -d/ -f5`
    PPA_ARRAY+=("$USER/$PPA")
  done < <(grep -o '^deb http://ppa.launchpad.net/[a-z0-9\-]\+/[a-z0-9\-]\+' $APT_LIST)
done

You can see the resulting script here.

Conclusion

I always seem to learn something new each time I write a Bash script. Some day, I should compile some of the tips and tricks I’ve picked up along the way.

Anyways, this is my first technical blog post in slightly more than a year. Hurrah!

comments powered by Disqus