IV - I/O, Pipes and Processes
Wildcards
Wildcards are special characters that can stand for strings. Wildcards enable you to work with groups of files without needing to type multiple files with similar names.
The asterisk *
can replace zero to unlimited characters except for a leading period.
The question mark ?
replaces exactly one character.
Examples
$ls *.py
$cd array_test
$ls input?.dat
$ls input??.dat
$ls input*.dat
$rm list?.sh
ls
with the pattern you plan to use with rm
.
Standard Streams
Each executable has associated with it three input/output streams: standard input , standard error , and standard output. Normally these streams come from or go to your console (i.e. your terminal).
Most Unix commands read from standard input and/or write to standard output.
These I/O streams are often represented as stdin, stderr, and stdout.
The Unix commands we have studied so far all write to standard output.
$ls -l
produces output to the terminal.
Standard Stream Redirection
The standard streams are normally associated with the console. The command will print to the terminal, or will read input typed into the terminal. Stdin and stout can also be redirected to write to or read from a file.
Redirect standard input with <
$./mycode < params.txt
Redirect standard output with >
$ls –l > filelist.txt
If the file exists, >
will overwrite it. Append with >>
.
$cat file1 >> bigfile.csv
Redirection of standard error depends on the shell and is needed for only a few commands.
For bash
$make >& make.out
redirects both stdout and stderr from the make
command to make.out
.
Pipes
One of the most powerful properties of Unix is that you can pipe the standard output of one command into the standard input of another.
The pipe symbol |
is above the backslash on most US keyboards. Pipes can be chained indefinitely, though most uses need just one.
cmd1 | cmd2 | cmd3
Example
Commands such as ls
have lengthy manpages that will tend to scroll off our terminal window.
$man ls | more
Now we can page through the listing.
Finding and Searching Files
Searching with grep
The grep
command is commonly used in UNIX to filter a file or input, line by line, against a pattern. Patterns can be complex and use regular expressions, but most of the time wildcards are sufficient.
grep [OPTIONS] PATTERN FILENAME
Example
The -i
option stands for “ignore case.”
$grep -i Unix intro_basic-unix.txt
Grep is frequently used with wildcards and pipes.
$grep -i write *f90
$grep weight: *md | grep 100
Example
How many sequences are in a FASTA-formatted file? Each sequence record in a FASTA file has one line of description that always starts with >
, followed by multiple lines of the sequence itself. Each sequence record ends when the next line starts with >
.
$grep -c '>' sequences.fasta
The -c
option returns the number of lines containing the pattern. Please be sure to include the quotes around the >
or the shell will interpret it as a redirection.
A Handy Trick
To find all occurrences of a pattern in all files in a directory, use grep -r
.
$grep -r "print" python_programs
Be careful with the pattern for a recursive search, or the output can be excessive.
Finding Files by Name
The find
command can locate a file if you cannot remember its directory. It can take wildcards, in which case it is best to use quotes around the name pattern.
$find . -name 2col.txt
./shakespeare/2col.txt
$find . -name "people*"
./data/people.txt
The period .
tells find to start at the current working directory.
Find has many options to locate files by name, type, date, and others. See here for examples.
Running Executables
Executables are often also called binaries. The terms are synonymous in most cases.
The shell has a predefined search path. It will look in a sequence of directories for an executable with the specified name, and will invoke the first one it encounters. If the executable is in this search path, you can simply type its name at the prompt.
$gedit hello_world.sh
here gedit
is the name of the executable. Its actual location is /usr/bin/gedit, but /usr/bin is in the default search path.
If the location of the binary is not in your search path, you must type the path to the executable (it can be absolute or relative)
$./hello_world.sh
For security reasons the current directory is not in your default search path. Hence you must explicitly provide the ./
path to execute a binary in the current directory. If you wish to add it, type (for bash)
$export PATH=$PATH:.
PATH
is called an environment variable. It holds the list of paths to be searched by the shell. In this example it is essential to add the first $PATH
or you will lose the default path set by the system.
If you are unsure of the path to the binary you wish to run, the which
command will tell you the path to the binary the shell will start.
$which g++
/apps/software/standard/core/gcc/11.4.0/bin/g++
Process Control
A running executable is a process to the Unix operating system. When it is run at a command line, a process can be running in the foreground, which suppresses a prompt, or in the background, which returns a prompt to the shell.
To start in the background, add an ampersand &
at the end of the command.
$./myexec -o myopt myfile&
Managing Processes
The jobs
command lists your running jobs (processes) with their job index.
The key combination control-z (ctrl-z or ^z) suspends the foreground job. To resume the job in the background, type bg
.
This can be combined with output from jobs
$bg %1 # place the job number 1 into the background
$fg %4 # place the job number 4 back to the foreground
For more general information about processes, use ps
(process status) The -u
option limits it to processes owned by user mst3k
.
$ps -u mst3k
PID TTY TIME CMD
498571 ? 00:00:00 systemd
498575 ? 00:00:00 (sd-pam)
498581 ? 00:00:00 pulseaudio
498582 ? 00:00:00 sshd
498593 pts/3 00:00:00 bash
498665 ? 00:00:00 dbus-daemon
498670 ? 00:00:00 dbus-daemon
498672 ? 00:00:00 dbus-kill-proce
498677 ? 00:00:00 gio
498685 ? 00:00:00 gvfsd
498691 ? 00:00:00 gvfsd-fuse
517189 pts/3 00:00:00 ps
The pid
is the process id.
Killing Processes
You have accidentally started a production job on a loginnode node. What to do?
You can kill your foreground process with Crtl c
.
#oops, I was supposed to run this through Slurm
$./myexe
^c
If you need to kill a background process, you can use jobs
to locate and foreground it. You may also have processes that don’t appear with jobs
. Use ps
to find the PID, then
$kill -9 <pid>
Do not type the angle brackets, just the number. Many processes will ignore the kill command without the -9
option so we routinely include it.
To kill by executable name
$killall -9 <executable name>
The kill command with -9 immediately kills the process without allowing the process to clean up or save data. The killall command can be used to kill all the processes that match a specific name or pattern.
If you find yourself in a jam and do not know what is wrong and you must start over,
$kill -9 -1
kills all your processes, including your login.
Dotfiles
“Dotfiles” are files that describe resources to programs that look for them. They begin with a period or “dot” (hence the name). In general, they are used for configuration of various software packages.
Dotfiles are hidden from ls
, but ls -a
shows them. Sometimes ls
is aliased to ls –a.
Bash has configuration dotfiles: .bash_profile
and .bashrc
.
- if no .bash_profile is present it will read .profile
- .bash_profile is sourced only for login shell
- the default .bash_profile on our HPC system incorporates the .bashrc; otherwise on a general Linux system, .bashrc is not run for a login shell.
Dot “files” may also be, and often are, directories.
$ls -a
. .bash_logout .bashrc .lesshst shakespeare .Xauthority
.. .bash_profile data .mozilla .ssh