NumPy Functions

NumPy provides many built-in functions for array manipulations, mathematical/statistical calculations, and reading files.

Reading and Writing Files

NumPy includes several functions that can simplify reading and writing files. For files with a simple spreadsheet-like structure, loadtxt works well. The first argument can be either a file name or a file handle.

x,y=np.loadtxt(f, delimiter=',', usecols=(0, 2), unpack=True)
v=np.loadtxt(f,delimiter=',',usecols=(1,)) #usecols needs tuple
W=np.loadtxt(fin,skiprows=2)

If unpack is not specified, loadtxt returns the values into a rank-2 array; otherwise it returns the values into a tuple of rank-1 arrays. Columns may optionally be selected with usecols. Header rows can be ignored with skiprows. Other options are available. The loadtxt function assumes the delimiter is whitespace, so if it is another character it must be specified through the delimiter argument.

More generally the genfromtxt function can be applied. The loadtxt function does not handle files with missing or irregular data, but genfrom txt can to an extent.

data = np.genfromtxt(infile, delimiter=';', comments='#', skip_footer=1)

For a simple data dump and restore we can use the save and load commands. The counterpart to loadtxt is savetxt for saving a rank-1 or rank-2 array.

np.savetxt(outfile,A,delimiter=',')

The save function will dump an array in binary format to a .npy file. The first argument is a “file-like” object, such as a filename, or a file handle that has been opened.

np.save(f,A)
#or
np.savez(f,A)  #compresses, saves as .npz

Its counterpart is load

A=np.load(f)

The load function can read a compressed file generated by savez.

Some Frequently Used NumPy Functions

Array Manipulation Mathematical Operations
arange abs, cos, sin, tan
array average, mean, median, std
argmin, argmax min, max
all, any, where ceil, floor
compress dot, matmul
copy sum, product
ones, zeros, empty min, max
reduce nan, isnan
repeat, reshape inf, isinf
rollaxis, swapaxis linspace
transpose lstsq

This is just a sample; the full reference can be examined at the manual.

Random Values

Some NumPy functionality is implemented through subpackages. One of the more widely used subpackage is the random module. Base Python has a random module, but just as the Python math module cannot operate on Ndarrays, neither can the base random return arrays of numbers.

There are now two sets of random functions. The “legacy” functions are in the RandomState class.

The random_sample function generates uniformly-distributed pseudorandom numbers on the interval [0,1).

import numpy as np
x=np.random.random_sample()  #a single value
y=np.random.random_sample(10) #a one-d array of 10
z=np.random.random_sample(4,5) #a two-d array of shape 4x5
w=np.random.rand(4,5)  #rand is a wrapper around random_sample

Other functions include

np.random.randint(1,11) #a random integer between [1,11) 11 not included
np.random.randint(1,11,size=10) #one-d array of random numbers
np.random.random_integers(1,10,size=10) #like above but inclusive of upper
np.random.choice([2,4,6,8]) #random selection from the sequence
deck=list(range(1,53))
np.random.shuffle(deck) #overwrites its argument
np.random.randn(4,5) #4x5 array of normally-distributed random numbers.

The newer class is the Generators class. The names of the methods are generally the same. To invoke Generator functions start off by calling the constructor. In this example, PCG64 is the random-number generator algorithm.

from numpy.random import Generator, PCG64
rng = Generator(PCG64())
rng.standard_normal()

Ufuncs

Functions that accept both arrays and scalars are called ufuncs for “universal functions”. The mathematical, statistical, and random functions we have discussed are examples of built-in ufuncs. You can write your own ufuncs easily. These functions are subject to some restrictions:

  • The function must not change its input parameters.
  • The function cannot print anything, or do any other form of IO
  • The function cannot exit (stop the run)
  • The function must return a single value (no tuples or other compound type)

Functions that adhere to these rules are said to be pure. The prohibition on printing does complicate debugging somewhat. Your functions must be thoroughly debugged for scalar inputs before testing with arrays.

import numpy as np 

def F2C(T):
    return 5.*(T-32.)/9.

TempsF=np.arange(0.,213.,2.)
TempsC=F2C(TempsF)
print(TempsC)
print(F2C(50.))

Optimization

Python can be quite slow, as is the case for most interpreted languages. Loops are generally the slowest constructs. NumPy array functions are highly optimized and often can eliminate loops.

Example: The built-in sum over an array or array slice can replace the corresponding loops, and is much faster.

s=0
for e in V:
    s+=s+e

Use instead

s=V.sum()

Exercise

Download the bodyfat.csv file. The weight is the third column and the height is the fourth column (in units of pounds and inches). Write a program that contains a ufunc for converting pounds to kg and another to convert inches to meters. Write a ufunc to compute BMI from metric height and weight. Read the bodyfat.csv file and use the ufuncs appropriately to create a new array of the BMI values. Look up the NumPy functions to print the mean and standard deviation of the values as well as the maximum and minimum values.

Example solution

import numpy as np

def inch_to_m(length):
    return length*0.0254

def pound_to_kg(weight):
    return weight*0.453592

def bmi(wt,ht):
    return wt/ht**2

weight,height=np.loadtxt("bodyfat.csv",delimiter=',',usecols=(2,3),skiprows=1,unpack=True)

wt=pound_to_kg(weight)
ht=inch_to_m(height)

bmi_data=bmi(wt,ht)

print(f"The mean is {bmi_data.mean():.1f} and the std is {bmi_data.std():.1f}")
print(f"The max is {bmi_data.max():.1f} and the min is {bmi_data.min():.1f}")

Resources

Essential documentation for NumPy is at its home site.

The documentation includes a beginner’s tutorial.

Previous
Next