Lecture 4, 4/5/10

Administrative stuff

Overloads: anyone waiting? running into trouble?
Grading: Peer grading is a go! Handout Wednesday, who to submit to by tonight.
Homework: It's due Wednesday. How's it going for everyone?

UNIX

The modern command-line really began with UNIX, so I'll start by saying a few words about UNIX itself. It's hard to overstate how important the development of UNIX is in the development of computing. As a random data point, here's a list of some of the modern OSes based on UNIX:

Linux
Mac OSX, NeXTSTEP
FreeBSD, OpenBSD, NetBSD
Solaris, HP-UX, AIX
Android, Maemo, Palm's webOS
...

The name UNIX has a long and sordid history, but nowadays generally refers to anything in this family of operating systems. Here are some operating systems that aren't "direct" descendants:

Windows 7
Windows Vista
Windows XP
Windows 2000
Windows NT
Windows Mobile, Windows Phone 7 Series ...

UNIX succeeded for a lot of reasons, one of which was definitely being in the right place at the right time. However, there's an underlying core philosophy that has made UNIX appealing since its very inception. Doug McIlroy (one of the original UNIX wizards) summed it up as follows:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

It's hard to get all of the UNIX philosophy into three sentences, but this comes darn close. In particular, one of the key things that you should take away from this is the following: a UNIX environment consists of a number of small and easily composable tools.

Some great quotes

The number of UNIX installations has grown to 10, with more expected. -- UNIX Programmer's Manual, 1972
Unix is simple. It just takes a genius to understand its simplicity. -- Dennis Ritchie
UNIX was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things. -- Doug Gwyn
Unix never says "please." -- Rob Pike
Unix is user-friendly. It just isn't promiscuous about which users it's friendly with. -- Steven King
Those who don't understand UNIX are condemned to reinvent it, poorly. -- Henry Spencer

Getting to the command line on your machine

If you're using Linux, just open a terminal.

If you're using Mac OSX, you just open a terminal -- but if you've never done this before, go to Applications -> Utilities -> Terminal.app, and you're good to go.

If you're using Windows, your best bet is to delete it and install something else install Cygwin. (I'm mostly kidding -- Windows actually has a number of amazing features. It just doesn't shine when I'm trying to spend an hour talking about using the command line.)

Let's go!

Okay, so now we're all at a command line. In general, the prompt at your terminal will be something complicated -- I've customized mine, and it looks like this:

[craigcitro@sharma ~] $

(If you'd like to play with customizing your prompt later, it's controlled by the environment variable PS1 -- if this doesn't mean anything, either ask me, google it, or just don't worry about it.) But that's way too cumbersome to type, and doesn't look like quite what you'll see -- so I'm going to abbreviate it to just

Two quick notes for talking about traversing filesystems: . always refers to the current directory, and .. always refers to the directory one level above where we are. Also, / refers to the root directory -- that is, the root of the filesystem. Here's an example of how this is relevant:

$ pwd
/sage/devel/sage
$ cd .
$ pwd
/sage/devel/sage
$ cd ..
$ pwd
/sage/devel

Play around a little, and this'll feel like second nature soon enough.

The shell is just like any other interpreter you've ever used -- you input commands, and performs them and reports back as necessary. In the case of a shell, the most common commands are of two types: running programs, or moving around in the file hierarchy.

First, and most importantly, let's talk about how the shell finds the commands you ask it to execute. If you're at the command line, and you type

$ awesome_new_program

the shell will look in a specific list of directories for a file called awesome_new_program that it's allowed to run. That list is stored in an environment variable called PATH, which you can print as follows:

$ echo $PATH
/Users/craigcitro/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/usr/X11/bin:/usr/local/cuda/bin:/Library/Frameworks/HaskellPlatform.framework/bin

(Your shell stores a whole bunch of environment variables for you; you can type set to see them all, and we'll talk more about them at some later point.)

Now, here's an interesting thing that you may have noticed: the current directory (.) didn't appear anywhere on that list. Of course, you can add it, but this is generally deemed a bad idea for a variety of reasons. That means that if you're trying to run a program in the current directory, you can't just use its name, even though it's "right there in front of you." You have to be explicit -- like this:

$ ls
foo     foo.c
$ foo
-bash: foo: command not found
$ ./foo
Hello world!

Now, let's say there's more than one program called foo in your path -- how do you tell which one is being run? The shell will go through the entries in PATH in order, so you could check them yourself -- or just ask the shell to tell you:

$ which ls
/bin/ls
$

Ten commands you must know

There are a million UNIX commands out there, and no one ever remembers every single one. Here's my personal list of commands I couldn't go a day without:

ls
cd
cp/rm
man
emacs (or your editor of choice -- vim, pico, ed ...)
cat/less
grep
find/locate/mdfind
ssh/scp
which

and a few more I couldn't last a week without:

wget/curl
ps
kill
top
head/tail
screen
sed/awk
alias
xargs
sort
uniq
apropos

I can't get to all of these today, but if you're curious, I'll give you a lead on finding out more.

RTFM

By far the most important command on that list is the one command that helps you find out more: man, which stands for manual. It's simple to use -- just pick your favorite command, and call man with that as the only argument:

$ man ls
<stuff>
$ man python
<other stuff>
$ man man
<stuff about finding more stuff>

Any time you have a question, or you're confused, the first thing you should do is check the man page.

Filesystem stuff

Here are the basic commands for snooping around the filesystem:

ls: list the files in this directory
cd: change directory
pwd: where am I?
mkdir: make new directory
cp: make a copy of a file
mv: move a file
rm: remove a file

These are all pretty straightforward, at least for basic use. Of course, over time, they've all picked up quite a bit of sophistication over the years. For instance, here's the basic ls:

[craigcitro@sharma /sage]  $ /bin/ls
COPYING.txt             install.log             sage-README-osx.txt
README.txt              ipython                 sage-python
data                    local                   spkg
devel                   makefile                test.log
examples                sage                    tmp

and here's what I see on my machine:

[craigcitro@sharma /sage]  $ l
total 14660
   72 COPYING.txt              4 makefile
   12 README.txt               4 sage*
    0 data/                    4 sage-README-osx.txt
    0 devel/                   4 sage-python*
    0 examples/                0 spkg/
14372 install.log            188 test.log
    0 ipython/                 0 tmp/
    0 local/

If you want to know about more options for ls, check out the man page. For the curious, here's what I use: on Mac/BSD, it's ls -sFG, and on Linux, it's ls -BhFvs --color=auto.

If you call cd with no arguments, it takes you back to your home directory. I find that I often do this by accident -- but here's a really cool trick: cd - changes back to the directory you were most recently in. This is incredibly useful:

$ pwd
/sage/devel/sage-main/sage/rings/polynomial
$ cd 
$ pwd
/Users/craigcitro
$ cd -
/sage/devel/sage-main/sage/rings/polynomial
$

You'll notice that most UNIX commands are fairly quiet -- they often don't produce output if everything goes right. This is disconcerting at first if you're not expecting it. However, much of UNIX is built on the idea of chaining things together -- if everything produced output, it'd just be too unwieldy.

Composability

As mentioned above, one of the most important parts of the UNIX philosophy is the ability to compose the small programs that each do one thing well. The most fundamental way to compose them is by using pipes, which are written |. Pipes are really just function composition: in principle, saying foo | bar runs the command foo, records the output, and then calls bar -- passing the output from foo as the input to bar. Here's a simple example, using another wildly useful UNIX utility: grep. The basic use of grep is easy to explain: doing grep "def" myfile.py will show you all lines in the file myfile.py that contain the string def. So let's say you have a program that prints out a bunch of output, and you only care about the lines that contain the word pickles:

$ pantry_contents
 ... lots of stuff ...
$ pantry_contents | grep pickle
1 jar of pickles
1 jar of pickled pig's feet
$

Of course, once you start combining a bunch of these, it gets way more exciting:

[craigcitro@sharma /sage/devel/sage-main]  $ find . | grep -E "(py|pyx)$" | wc -l
    3899
[craigcitro@sharma /sage/devel/sage-main]  $ find . | grep -E "(py|pyx)$" | xargs -n 1 cat | sort | uniq | wc -l
  403291

The first of those is the the number of files in the Sage codebase, and the second is the number of unique lines of code in the Sage library.

There are two other related things worth mentioning here. Often, you have a bunch of output spewing from some program, and you want to just save it to a file -- foo >output.txt will redirect the output of foo to the file output.txt. Similarly, doing foo <input.txt will run the program foo, passing the contents of input.txt as input to the program. (The first is insanely useful, and the second is often useful, too.)

References

These are two fairly old, but classic, books on the UNIX environment:

The UNIX Programming Environment, by Kernighan and Pike
The Design of the UNIX Operating System, by Bach

10/480b/lectures/lec4