The SDL Component Suite is an industry leading collection of components supporting scientific and engineering computing. Please visit the SDL Web site for more information....



SplitDataSet


Unit: SDL_datatable
Class: TDataTable
Declaration: function SplitDataSet (SplitMode: TDataSetSplitMode; SelMode: TDataSelMode; SplitSize: integer; CreateAllPairs: boolean; FNameTemplate: string; AllowOverWrite: boolean; var FileNames: TStringList): integer;

The method SplitDataSet splits the dataset into several subsets and stores the resulting datasets as ASC-formatted files which can be read by the method ImportASC. The type of splitting is controlled by the parameter SplitMode, the type of the selection of the rows/columns of the dataset is controlled by the parameter SelMode. The parameter SplitSize has two different meanings: if SplitMode is splitRows or splitColumns, SplitSize specifies the number of datasets to be created. If SplitMode is splitTstTrn the parameter SplitSize defines the size of the test set.

The follwing example shows the different splitting options for SplitMode = splitrows, SplitSize = 3:

If SplitMode is set to splitTstTrn the dataset is split rowwise into two subsets - a test and a training set. The parameter SplitSize dermines the size of the test set. The following figure shows, for example, the generation of the two subsets with SplitSize = 7 and random selection:

Setting the parameter CreateAllPairs to TRUE results in several test and training datasets, with the test sets being mutual exclusive. Please note this parameter is only evaluated if the dataset is split into test and training sets, for all other split modes it is ignored.

The parameter FNameTemplate has to specify the path and filename (without extension) of the datasets to be created. The filename is automatically extended by a running index and/or the substring 'tst' or 'trn' to indicate the various splits. Further, the splitting parameters are prepended to the comment of the subsets.

If you set the AllowOverWrite to TRUE, the created files overwrite possibly existing files without confirmation, if AllowOverwrite is FALSE the splitting is only performed if no existing files are overwritten, else an error number is returned. The list of the created subsets is returned in the in stringlist FileNames (without path names).

The method returns the following error codes:

 0 ... everything is OK, split is performed
-1 ... invalid SplitSize (valid range: 1 to NrOfColumns/NrOfRows, depending of the SplitMode parameter)
-2 ... the target directory already contains files which would be overwritten


Last Update: 2023-Dec-14