Chapter Introductions
“Give the pupils something to do, not something to learn; and doing is of such a nature as to demand thinking;
learning naturally results.”
– John Dewey
Chapter 1: Basic Matrix Operations
Data matrices appear everywhere in our daily life, such as the daily body weight data of 100 clients of your diet clinic in the
last 21 days. If you put each client’s data on a row, then the row has 21 data entries. This row may be called a data sequence,
or a row vector. You stack the data sequence of all the 100 clients together one above another ordered according to their last
name. You thus have created a rectangular 2-dimensional array of numbers, a 100-by-21 matrix. This data matrix has 100 rows and
21 columns.
You may add client names for the row names and date for the column names to the data matrix and form a dataframe. Dataframe is
a commonly used concept in R and Python data analysis. Dataframe has a nice presentation of data and also allows you to name the
data in rows and columns, such as John Smith’s body weight data, in contrast to referring to the 83rd row.
This chapter is limited to (i) Describing the basic matrix operations commonly used in your coursework or career, e.g.,
eigenvector decomposition of a matrix and solution of linear equations; (ii) presenting matrix application examples of data
science, e.g., sub-matrix generation, array-matrix conversion, and data statistics; and (iii) solutions of linear equations.
Chapter 2: Matrix Theory and Visualization
This chapter includes a slightly more advanced theory of matrix compared with the last. It includes (i) the concepts of
independence, (ii) spanned spaces by multiple vectors, (iii) rank and other properties of a matrix, (iv) more on SVD, and
(v) visualization of matrices and their decomposed vectors and singular values.
Many mathematicians regard mathematics as “beautiful,” using terms such as “elegant,” “deep,” and “general.” Our book,
however, pays more attention to relevant, useful, and modern (RUM) from the perspectives of both mathematical sciences and
other fields. Relevant is reflected by that every matrix and its operations can have an interpretation story that can be
easily understood by a layman. Useful is reflected by that each theory or method has non-trivial application examples
in science or engineering. Useful is reflected in the extensive use of matrices in modern data science, machine learning, R
or Python programming, which were not the case two or three decades ago. Matrix visualization is an example of modernness.
This chapter continues to feature the space-time data arrangement, which uses rows of a matrix for spatial locations, and
columns for temporal steps. This is the universal and fundamental information structure of our world. The singular value
decomposition (SVD) helps reveal the spatial and temporal features of climate dynamics as singular vectors and the strength
of their variability as singular values.
Chapter 3: Matrix Applications to Machine Learning
Machine learning (ML) is a branch of science that uses data and algorithms to mimic how human beings learn. The accuracy of the
ML results can be gradually improved based on new training data and algorithm update. For example, a baby learns how to pick an
orange from a fruit plate containing apples, bananas and oranges. Another baby learns how to sort out different kinds of fruits
from a basket into three categories without naming the fruits. Then, how does ML work? It is basically a decision process for
clustering, classification, or prediction, based on the input data, decision criteria, and algorithms. It does not stop here.
It further validates the decision results and quantifies errors. The errors and the updated data will help update the algorithms
and improve the results.
ML has recently become a very popular method in climate science due to the availability of powerful and convenient resources of
computing. It has been used to predict weather and climate, and to develop climate models. This chapter is a brief introduction
of ML and provides basic ideas and examples. Our materials will help readers understand and improve the more complex ML
algorithms used in climate science, so that they can go a step beyond only applying the ML software packages as a black box. We
also provide R and Python codes for some basic ML algorithms, such as K-means for clustering, support vector machine for the
maximum separation of sets, random forest of decision trees for classification and regression, and neural network training and
predictions.
Artificial intelligence (AI) allows computers to automatically learn from past data without human programming, which enables a
machine to learn and to have intelligence. Machine learning is a subset of AI. Our chapter here focuses on ML, not the general AI.
Chapter 4: Matrix Applications to Regression Models
The word “regression” means “a return to a previous and less advanced or worse form, state, condition, or way of behaving,”
according to the Cambridge dictionary. The first part “regress” of the word originates from the Latin “regressus,” past
participle of regredi (“to go back”), from re- (“back”) + gradi (“to go”). Thus, “regress” means “return, to go back” and is
in contrast to the commonly used word “progress.” The regression in statistical data analysis refers to a process of returning
from the irregular and complex data to a simpler and less perfect state, which is called a model and can be expressed as a
curve, a surface, or a function. The function or curve, less complex or less advanced than the irregular data pattern,
describes a way of behaving or a relationship. This chapter covers linear models in both uni- and multivariate regressions,
least-square estimations of parameters, confidence intervals and inference of the parameters, and fittings of polynomials
and other nonlinear curves. By running diagnostic studies on residuals we explain the assumptions of a linear regression
model: linearity, homogeneity, independence, and normality. As usual, we use examples of real climate data and provide both R
and Python codes.
Appendix A: A Tutorial of R and RStudio
The book uses R and the R Notebook. This chapter explains the installation of R and R Studio and demonstrates some basic uses of R.
Equivalent Python codes and their Jupyter Notebooks may be found at the website
"Climate Mathematics".
Appendix B: Visualization of Matrices
People talk about climate data frequently, also read or imagine climate data, and yet rarely play with them and use them, because
people often think that it takesa computer expert to do that. However, that has changed. With today’s technology, now anyone
can use a computer to play with climate data, such as a sequence of temperature values of a weather station at different
observed times, a matrix of data for a station for temperature, air pressure, precipitation; wind speed, and wind direction at
different times; and an array of temperature data on a 5-degree latitude-longitude grid for the entire world for different
months. The first is a vector. The second is a variable-time matrix, and a space-time 3-dimensional array. When considering
temperature variation in time at different air pressure levels and different water depth, we need to add one more dimension:
the altitude. The temperature data for ocean and atmosphere for the Earth is a 4-dimensional array, with 3D space and 1D time.
This chapter attempts to provide basic statistical and computing methods to describe and visualize some simple climate datasets.
As the book progresses, more complex statistics and data visualization will be introduced.
We use both R and Python computer codes in this book for computing and visualization. Our method description is stated in R.
A Python code following each R code is included in a box with a light yellow background. You can also learn the two computer
languages and their applications to climate data from the book “Climate Mathematics: Theory and Applications” (Shen and
Somerville 2019) and its website
"www.climatemathematics.org".
The climate data used in this book are included in the data.zip file downloadable from our book website
"climatemathematics.org". You can also obtain the updated data from the
original data providers, such as
"www.esrl.noaa.gov" and
"www.ncei.noaa.gov".
After learning this chapter, a reader should be able to analyze simple climate datasets, compute data statistics, and plot the
data in various ways.