One major step before building a predictive model with spectral data is to split your dataset in two or three parts. One part will be called the calibration dataset, used to train your model. The other part is the validation dataset, on which the model will be ran to assess the performances of the model. Finally, if you aim to use calibration transfer techniques, you will need a third part, the transfer dataset.
This step is important: the validation dataset needs to be representative of your global dataset. In the script, several methods for splitting datasets are displayed: random selection, Kennard-Stone method...
The Kennard-Stone method is very commonly used for spectroscopic data, as it enables to select calibration and validation data according to spectral representativity. For further details, see Kennard and Stone, 1969.
It is usually recommended to use 60-80% of your dataset for calibration, and 20-40% for validation.