Model Tools

This part of the library is designed for model creation and analysing.

get_score

It returns the requested metric scores for the model.

Parameters	Datatype	Default Value
y_test	1D array
y_pred	1D array
metrics	list	None
average	string	weighted
algo_type	string	clf
verbose	boolean	True

These are the valid keywords for metrics:

algo_type	metrics keyword	sklearn function
clf	acc	accuracy_score
clf	f1	f1_score
clf	hamming	hamming_loss
clf	jaccard	jaccard_score
clf	log	log_loss
clf	mcc	matthews_corrcoef
clf	precision	precision_score
clf	recall	recall_score
clf	zol	zero_one_loss
reg	var	explained_variance_score
reg	max	max_error
reg	var	explained_variance_score
reg	abs	mean_absolute_error
reg	sq	mean_squared_error
reg	rsq	root_mean_squared_error
reg	log	mean_squared_log_error
reg	rlog	root_mean_squared_log_error
reg	medabs	median_absolute_error
reg	poisson	mean_poisson_deviance
reg	gamma	mean_gamma_deviance
reg	per	mean_absolute_percentage_error
reg	d2abs	d2_absolute_error_score
reg	d2pin	d2_pinball_score
reg	d2twe	d2_tweedie_score

Attention

average value must be valid for sklearn’s metrics functions.

Tip

If metrics is empty, for classification acc, for regression sq is printed out.

Priority (in return)	Returns	Datatype	Condition
1	scores	dictionary	always

do_voting

It is designed to make voting between y_pred arrays which are created by different models for the same test data.

Parameters	Datatype	Default Value
y_pred_list	list
combinations	list
strategy	string	avg

Attention

combinations is the list of selected indexes for y_pred_list lists. It can be created by using do_combinations functions.

Note

strategy has two valid values: ‘avg’ and ‘mode’. In ‘avg’ mode, it calculates the mean value and rounds it to an integer value for each prediction. In ‘mode’ mode, it chooses the most frequent output for each prediction.

Priority (in return)	Returns	Datatype	Condition
1	results	list	always

Note

It returns the final y_pred arrays inside a list. nth y_pred inside results is always designed for nth index list in combinations.

do_combinations

It creates combinations due to requested parameters.

Parameters	Datatype	Default Value
indexes	list
min_item	integer
max_item	integer

It makes from min_item selections to max_item selections inside given indexes set.

Tip

This function might be useful for do_voting function.

Priority (in return)	Returns	Datatype	Condition
1	combinations	list	always

examine_time

It measures the training time for the given model.

Parameters	Datatype	Default Value
model	any AI model object that has predict function
X_train	multidimensional array
y_train	1D array

WelkinClassification

It is a classification algorithm designed by the developer himself. Further information, visit Welkin Classification article.

These are the functions inside the model:

__init__(strategy=’travel’, priority=None, limit=None)
fit(X_train, y_train)
predict(X_test)

Note

strategy has two valid values: ‘travel’ and ‘limit’.

Attention

priority’s datatype is list, limit’s datatype is integer. Before changing default settings please read the article mentioned early.

DistRegressor

It is a regression algorithm designed by the developer himself. Further information, visit Dist Regression article.

These are the functions inside the model:

__init__(verbose=True, clf_model=None, clf_params=None, reg_model=None, reg_params=None, efficiency=’time’, rus=True)
fit(X_train, y_train)
predict(X_test)
is_data_normal(y)

Attention

Before changing default settings please read the article mentioned early.

compare_models

It is designed to compare models on the same data according to requested metrics.

Parameters	Datatype	Default Value
algo_type	string	clf
algorithms	list
metrics	list
X_train	multidimensional array
y_train	1D array
X_test	multidimensional array
y_test	1D array
get_result	boolean	False

Note

algo_type has two valid values: ‘clf’ for classification and ‘reg’ for regression.

These are the valid keywords for algorithms:

algo_type	algorithm keyword	class
clf/reg	all	if the list has it at index zero then it presumes that it contains all keywords
clf	cat	CatBoostClassifier
clf	ada	AdaBoostClassifier
clf	dtr	DecisionTreeClassifier
clf	raf	RandomForestClassifier
clf	lbm	LGBMClassifier
clf	ext	ExtraTreeClassifier
clf	log	LogisticRegression
clf	knn	KNeighborsClassifier
clf	gnb	GaussianNB
clf	rdg	RidgeClassifier
clf	bnb	BernoulliNB
clf	svc	SVC
clf	per	Perceptron
clf	mnb	MultinomialNB
reg	cat	CatBoostRegressor
reg	ada	AdaBoostRegressor
reg	dtr	DecisionTreeRegressor
reg	raf	RandomForestRegressor
reg	lbm	LGBMRegressor
reg	ext	ExtraTreeRegressor
reg	lin	LinearRegression
reg	knn	KNeighborsRegressor
reg	svr	SVR

These are the valid keywords for metrics:

algo_type	metrics keyword	sklearn function
clf	acc	accuracy_score
clf	f1	f1_score
clf	hamming	hamming_loss
clf	jaccard	jaccard_score
clf	log	log_loss
clf	mcc	matthews_corrcoef
clf	precision	precision_score
clf	recall	recall_score
clf	zol	zero_one_loss
reg	var	explained_variance_score
reg	max	max_error
reg	var	explained_variance_score
reg	abs	mean_absolute_error
reg	sq	mean_squared_error
reg	rsq	root_mean_squared_error
reg	log	mean_squared_log_error
reg	rlog	root_mean_squared_log_error
reg	medabs	median_absolute_error
reg	poisson	mean_poisson_deviance
reg	gamma	mean_gamma_deviance
reg	per	mean_absolute_percentage_error
reg	d2abs	d2_absolute_error_score
reg	d2pin	d2_pinball_score
reg	d2twe	d2_tweedie_score

Note

The function always prints out the results on the console.

Priority (in return)	Returns	Datatype	Condition
1	results	dict	get_result is True

get_best_model

It gets the best model for the requested metric and trains it. This function can be used with dictionary which is obtained by using compare_models.

Parameters	Datatype	Default Value
scores	string	clf
rel_metric	list
algo_type	list
X_train	multidimensional array
y_train	1D array
behavior	string	min-best
verbose	boolean	True

Note

algo_type has two valid values: ‘clf’ for classification and ‘reg’ for regression.

Note

In order to choose the best model rel_metric is the decisive metric inside the results

These are the valid keywords for rel_metric:

algo_type	metrics keyword	sklearn function
clf	acc	accuracy_score
clf	f1	f1_score
clf	hamming	hamming_loss
clf	jaccard	jaccard_score
clf	log	log_loss
clf	mcc	matthews_corrcoef
clf	precision	precision_score
clf	recall	recall_score
clf	zol	zero_one_loss
reg	var	explained_variance_score
reg	max	max_error
reg	var	explained_variance_score
reg	abs	mean_absolute_error
reg	sq	mean_squared_error
reg	rsq	root_mean_squared_error
reg	log	mean_squared_log_error
reg	rlog	root_mean_squared_log_error
reg	medabs	median_absolute_error
reg	poisson	mean_poisson_deviance
reg	gamma	mean_gamma_deviance
reg	per	mean_absolute_percentage_error
reg	d2abs	d2_absolute_error_score
reg	d2pin	d2_pinball_score
reg	d2twe	d2_tweedie_score

Note

behavior has two valid values: ‘min-best’ for minimum score is the best situation and ‘max-best’ for maximum score is the best situation.

subacc

It calculates the accuracy score for each class independently. It can also return the actual accuracy score if requested.

Parameters	Datatype	Default Value
y_train	1D array
y_pred	1D array
get_general	boolean	False

Priority (in return)	Returns	Datatype	Condition
1	accuracies	dict	always
2	score	float	get_general is True

get_models

It returns requested models in a dictionary after training them.

Parameters	Datatype	Default Value
algorithms	list
X_train	multidimensional array
y_train	1D array

These are the valid keywords for algorithms:

algo_type	algorithm keyword	class
clf/reg	all	if the list has it at index zero then it presumes that it contains all keywords
clf	cat	CatBoostClassifier
clf	ada	AdaBoostClassifier
clf	dtr	DecisionTreeClassifier
clf	raf	RandomForestClassifier
clf	lbm	LGBMClassifier
clf	ext	ExtraTreeClassifier
clf	log	LogisticRegression
clf	knn	KNeighborsClassifier
clf	gnb	GaussianNB
clf	rdg	RidgeClassifier
clf	bnb	BernoulliNB
clf	svc	SVC
clf	per	Perceptron
clf	mnb	MultinomialNB
reg	cat	CatBoostRegressor
reg	ada	AdaBoostRegressor
reg	dtr	DecisionTreeRegressor
reg	raf	RandomForestRegressor
reg	lbm	LGBMRegressor
reg	ext	ExtraTreeRegressor
reg	lin	LinearRegression
reg	knn	KNeighborsRegressor
reg	svr	SVR

Priority (in return)	Returns	Datatype	Condition
1	models	dict	always

commune_create

It declares the way of commune classification for the given dataset. Further information, please read the Classification with Commune Technique article.

Parameters	Datatype	Default Value
algorithms	list
X_train	multidimensional array
y_train	1D array
X_val	multidimensional array
y_val	1D array
get_dict	boolean	False

These are the valid keywords for algorithms:

algo_type	algorithm keyword	class
clf	all	if the list has it at index zero then it presumes that it contains all keywords
clf	cat	CatBoostClassifier
clf	ada	AdaBoostClassifier
clf	dtr	DecisionTreeClassifier
clf	raf	RandomForestClassifier
clf	lbm	LGBMClassifier
clf	ext	ExtraTreeClassifier
clf	log	LogisticRegression
clf	knn	KNeighborsClassifier
clf	gnb	GaussianNB
clf	rdg	RidgeClassifier
clf	bnb	BernoulliNB
clf	svc	SVC
clf	per	Perceptron
clf	mnb	MultinomialNB

Priority (in return)	Returns	Datatype	Condition
1	y_pred	1D array	always
2	declaration	dict	get_dict is True

commune_apply

It predicts the result due to the given declaration.

Parameters	Datatype	Default Value
declaration	dict
X_test	multidimensional array

Attention

declaration can be obtained by using commune_create function.

Priority (in return)	Returns	Datatype	Condition
1	y_pred	1D array	always

find_deflection

It analyses the difference between prediction and actual values for regression problems and returns a report about how successful the prediction was.

Parameters	Datatype	Default Value
y_test	1D array
y_pred	1D array
arr	boolean	True
avg	boolean	False
gap	integer or float	None
gap_type	string	num
dif_type	string	f-i
avg_w_abs	boolean	True
success_indexes	boolean	False

These are the valid keywords for gap_type:

gap_type	succession condition
exact	prediction = actual
num	actual - gap <= prediction <= actual + gap
num+	actual <= prediction <= actual + gap
num-	actual - gap <= prediction <= actual
per	(100 - gap) * actual / 100 <= prediction <= (100 + gap) * actual / 100
per+	actual <= prediction <= (100 + gap) * actual / 100
per-	(100 - gap) * actual / 100 <= prediction <= actual

Priority (in return)	Returns	Datatype	Condition
1	diffs	list	arr is True
2	avg_score	float	avg is True
3	succ	integer	gap is not None
4	indexes	list	success_indexes is True

Note

diffs is the list full of with the differences between actual and predicted values.

These are the supported methods for difference calculation:

dif_type	calculation
f-i	final (prediction) - init (actual)
i-f	init (actual) - final (prediction)
abs	absolute

Note

avg_score equals the arithmetic mean of the diffs set.

Note

succ is the amount of the succeeded predictions according to the gap condition. indexes hold the index information, which are successful predictions.

get_measure

It provides some other results for the test data in order to measure the quality of the prediction.

Parameters

Datatype

Default Value

y_test

1D array

measures

string list

y_train

1D array

None

Note

measures may only contain these: ‘majority’, ‘minority’, ‘random’ and ‘weighted’.

Measure	Use Case
majority	only the most frequent class
minority	only the less frequent array
random	random selection between classes
weighted	random selection due to frequencies of classes

Priority (in return)	Returns	Datatype	Condition
1	arrays	dictionary	always