Model Tools

This part of the library is designed for model creation and analysing.

get_score

It returns the requested metric scores for the model.

Parameters

Datatype

Default Value

y_test

1D array

y_pred

1D array

metrics

list

None

average

string

weighted

algo_type

string

clf

verbose

boolean

True

These are the valid keywords for metrics:

algo_type

metrics keyword

sklearn function

clf

acc

accuracy_score

clf

f1

f1_score

clf

hamming

hamming_loss

clf

jaccard

jaccard_score

clf

log

log_loss

clf

mcc

matthews_corrcoef

clf

precision

precision_score

clf

recall

recall_score

clf

zol

zero_one_loss

reg

var

explained_variance_score

reg

max

max_error

reg

var

explained_variance_score

reg

abs

mean_absolute_error

reg

sq

mean_squared_error

reg

rsq

root_mean_squared_error

reg

log

mean_squared_log_error

reg

rlog

root_mean_squared_log_error

reg

medabs

median_absolute_error

reg

poisson

mean_poisson_deviance

reg

gamma

mean_gamma_deviance

reg

per

mean_absolute_percentage_error

reg

d2abs

d2_absolute_error_score

reg

d2pin

d2_pinball_score

reg

d2twe

d2_tweedie_score

Attention

average value must be valid for sklearn’s metrics functions.

Tip

If metrics is empty, for classification acc, for regression sq is printed out.

Priority (in return)

Returns

Datatype

Condition

1

scores

dictionary

always

do_voting

It is designed to make voting between y_pred arrays which are created by different models for the same test data.

Parameters

Datatype

Default Value

y_pred_list

list

combinations

list

strategy

string

avg

Attention

combinations is the list of selected indexes for y_pred_list lists. It can be created by using do_combinations functions.

Note

strategy has two valid values: ‘avg’ and ‘mode’. In ‘avg’ mode, it calculates the mean value and rounds it to an integer value for each prediction. In ‘mode’ mode, it chooses the most frequent output for each prediction.

Priority (in return)

Returns

Datatype

Condition

1

results

list

always

Note

It returns the final y_pred arrays inside a list. nth y_pred inside results is always designed for nth index list in combinations.

do_combinations

It creates combinations due to requested parameters.

Parameters

Datatype

Default Value

indexes

list

min_item

integer

max_item

integer

It makes from min_item selections to max_item selections inside given indexes set.

Tip

This function might be useful for do_voting function.

Priority (in return)

Returns

Datatype

Condition

1

combinations

list

always

examine_time

It measures the training time for the given model.

Parameters

Datatype

Default Value

model

any AI model object that has predict function

X_train

multidimensional array

y_train

1D array

WelkinClassification

It is a classification algorithm designed by the developer himself. Further information, visit Welkin Classification article.

These are the functions inside the model:

  • __init__(strategy=’travel’, priority=None, limit=None)

  • fit(X_train, y_train)

  • predict(X_test)

Note

strategy has two valid values: ‘travel’ and ‘limit’.

Attention

priority’s datatype is list, limit’s datatype is integer. Before changing default settings please read the article mentioned early.

DistRegressor

It is a regression algorithm designed by the developer himself. Further information, visit Dist Regression article.

These are the functions inside the model:

  • __init__(verbose=True, clf_model=None, clf_params=None, reg_model=None, reg_params=None, efficiency=’time’, rus=True)

  • fit(X_train, y_train)

  • predict(X_test)

  • is_data_normal(y)

Attention

Before changing default settings please read the article mentioned early.

compare_models

It is designed to compare models on the same data according to requested metrics.

Parameters

Datatype

Default Value

algo_type

string

clf

algorithms

list

metrics

list

X_train

multidimensional array

y_train

1D array

X_test

multidimensional array

y_test

1D array

get_result

boolean

False

Note

algo_type has two valid values: ‘clf’ for classification and ‘reg’ for regression.

These are the valid keywords for algorithms:

algo_type

algorithm keyword

class

clf/reg

all

if the list has it at index zero then it presumes that it contains all keywords

clf

cat

CatBoostClassifier

clf

ada

AdaBoostClassifier

clf

dtr

DecisionTreeClassifier

clf

raf

RandomForestClassifier

clf

lbm

LGBMClassifier

clf

ext

ExtraTreeClassifier

clf

log

LogisticRegression

clf

knn

KNeighborsClassifier

clf

gnb

GaussianNB

clf

rdg

RidgeClassifier

clf

bnb

BernoulliNB

clf

svc

SVC

clf

per

Perceptron

clf

mnb

MultinomialNB

reg

cat

CatBoostRegressor

reg

ada

AdaBoostRegressor

reg

dtr

DecisionTreeRegressor

reg

raf

RandomForestRegressor

reg

lbm

LGBMRegressor

reg

ext

ExtraTreeRegressor

reg

lin

LinearRegression

reg

knn

KNeighborsRegressor

reg

svr

SVR

These are the valid keywords for metrics:

algo_type

metrics keyword

sklearn function

clf

acc

accuracy_score

clf

f1

f1_score

clf

hamming

hamming_loss

clf

jaccard

jaccard_score

clf

log

log_loss

clf

mcc

matthews_corrcoef

clf

precision

precision_score

clf

recall

recall_score

clf

zol

zero_one_loss

reg

var

explained_variance_score

reg

max

max_error

reg

var

explained_variance_score

reg

abs

mean_absolute_error

reg

sq

mean_squared_error

reg

rsq

root_mean_squared_error

reg

log

mean_squared_log_error

reg

rlog

root_mean_squared_log_error

reg

medabs

median_absolute_error

reg

poisson

mean_poisson_deviance

reg

gamma

mean_gamma_deviance

reg

per

mean_absolute_percentage_error

reg

d2abs

d2_absolute_error_score

reg

d2pin

d2_pinball_score

reg

d2twe

d2_tweedie_score

Note

The function always prints out the results on the console.

Priority (in return)

Returns

Datatype

Condition

1

results

dict

get_result is True

get_best_model

It gets the best model for the requested metric and trains it. This function can be used with dictionary which is obtained by using compare_models.

Parameters

Datatype

Default Value

scores

string

clf

rel_metric

list

algo_type

list

X_train

multidimensional array

y_train

1D array

behavior

string

min-best

verbose

boolean

True

Note

algo_type has two valid values: ‘clf’ for classification and ‘reg’ for regression.

Note

In order to choose the best model rel_metric is the decisive metric inside the results

These are the valid keywords for rel_metric:

algo_type

metrics keyword

sklearn function

clf

acc

accuracy_score

clf

f1

f1_score

clf

hamming

hamming_loss

clf

jaccard

jaccard_score

clf

log

log_loss

clf

mcc

matthews_corrcoef

clf

precision

precision_score

clf

recall

recall_score

clf

zol

zero_one_loss

reg

var

explained_variance_score

reg

max

max_error

reg

var

explained_variance_score

reg

abs

mean_absolute_error

reg

sq

mean_squared_error

reg

rsq

root_mean_squared_error

reg

log

mean_squared_log_error

reg

rlog

root_mean_squared_log_error

reg

medabs

median_absolute_error

reg

poisson

mean_poisson_deviance

reg

gamma

mean_gamma_deviance

reg

per

mean_absolute_percentage_error

reg

d2abs

d2_absolute_error_score

reg

d2pin

d2_pinball_score

reg

d2twe

d2_tweedie_score

Note

behavior has two valid values: ‘min-best’ for minimum score is the best situation and ‘max-best’ for maximum score is the best situation.

subacc

It calculates the accuracy score for each class independently. It can also return the actual accuracy score if requested.

Parameters

Datatype

Default Value

y_train

1D array

y_pred

1D array

get_general

boolean

False

Priority (in return)

Returns

Datatype

Condition

1

accuracies

dict

always

2

score

float

get_general is True

get_models

It returns requested models in a dictionary after training them.

Parameters

Datatype

Default Value

algorithms

list

X_train

multidimensional array

y_train

1D array

These are the valid keywords for algorithms:

algo_type

algorithm keyword

class

clf/reg

all

if the list has it at index zero then it presumes that it contains all keywords

clf

cat

CatBoostClassifier

clf

ada

AdaBoostClassifier

clf

dtr

DecisionTreeClassifier

clf

raf

RandomForestClassifier

clf

lbm

LGBMClassifier

clf

ext

ExtraTreeClassifier

clf

log

LogisticRegression

clf

knn

KNeighborsClassifier

clf

gnb

GaussianNB

clf

rdg

RidgeClassifier

clf

bnb

BernoulliNB

clf

svc

SVC

clf

per

Perceptron

clf

mnb

MultinomialNB

reg

cat

CatBoostRegressor

reg

ada

AdaBoostRegressor

reg

dtr

DecisionTreeRegressor

reg

raf

RandomForestRegressor

reg

lbm

LGBMRegressor

reg

ext

ExtraTreeRegressor

reg

lin

LinearRegression

reg

knn

KNeighborsRegressor

reg

svr

SVR

Priority (in return)

Returns

Datatype

Condition

1

models

dict

always

commune_create

It declares the way of commune classification for the given dataset. Further information, please read the Classification with Commune Technique article.

Parameters

Datatype

Default Value

algorithms

list

X_train

multidimensional array

y_train

1D array

X_val

multidimensional array

y_val

1D array

get_dict

boolean

False

These are the valid keywords for algorithms:

algo_type

algorithm keyword

class

clf

all

if the list has it at index zero then it presumes that it contains all keywords

clf

cat

CatBoostClassifier

clf

ada

AdaBoostClassifier

clf

dtr

DecisionTreeClassifier

clf

raf

RandomForestClassifier

clf

lbm

LGBMClassifier

clf

ext

ExtraTreeClassifier

clf

log

LogisticRegression

clf

knn

KNeighborsClassifier

clf

gnb

GaussianNB

clf

rdg

RidgeClassifier

clf

bnb

BernoulliNB

clf

svc

SVC

clf

per

Perceptron

clf

mnb

MultinomialNB

Priority (in return)

Returns

Datatype

Condition

1

y_pred

1D array

always

2

declaration

dict

get_dict is True

commune_apply

It predicts the result due to the given declaration.

Parameters

Datatype

Default Value

declaration

dict

X_test

multidimensional array

Attention

declaration can be obtained by using commune_create function.

Priority (in return)

Returns

Datatype

Condition

1

y_pred

1D array

always

find_deflection

It analyses the difference between prediction and actual values for regression problems and returns a report about how successful the prediction was.

Parameters

Datatype

Default Value

y_test

1D array

y_pred

1D array

arr

boolean

True

avg

boolean

False

gap

integer or float

None

gap_type

string

num

dif_type

string

f-i

avg_w_abs

boolean

True

success_indexes

boolean

False

These are the valid keywords for gap_type:

gap_type

succession condition

exact

prediction = actual

num

actual - gap <= prediction <= actual + gap

num+

actual <= prediction <= actual + gap

num-

actual - gap <= prediction <= actual

per

(100 - gap) * actual / 100 <= prediction <= (100 + gap) * actual / 100

per+

actual <= prediction <= (100 + gap) * actual / 100

per-

(100 - gap) * actual / 100 <= prediction <= actual

Priority (in return)

Returns

Datatype

Condition

1

diffs

list

arr is True

2

avg_score

float

avg is True

3

succ

integer

gap is not None

4

indexes

list

success_indexes is True

Note

diffs is the list full of with the differences between actual and predicted values.

These are the supported methods for difference calculation:

dif_type

calculation

f-i

final (prediction) - init (actual)

i-f

init (actual) - final (prediction)

abs

absolute

Note

avg_score equals the arithmetic mean of the diffs set.

Note

succ is the amount of the succeeded predictions according to the gap condition. indexes hold the index information, which are successful predictions.

get_measure

It provides some other results for the test data in order to measure the quality of the prediction.

Parameters

Datatype

Default Value

y_test

1D array

measures

string list

y_train

1D array

None

Note

measures may only contain these: ‘majority’, ‘minority’, ‘random’ and ‘weighted’.

Measure

Use Case

majority

only the most frequent class

minority

only the less frequent array

random

random selection between classes

weighted

random selection due to frequencies of classes

Priority (in return)

Returns

Datatype

Condition

1

arrays

dictionary

always