Model Tools ============= This part of the library is designed for model creation and analysing. get_score __________ It returns the requested metric scores for the model. ========== ======== ============= Parameters Datatype Default Value ========== ======== ============= y_test 1D array - y_pred 1D array - metrics list None average string weighted algo_type string clf verbose boolean True ========== ======== ============= These are the valid keywords for metrics: ========= =============== ============================== algo_type metrics keyword sklearn function ========= =============== ============================== clf acc accuracy_score clf f1 f1_score clf hamming hamming_loss clf jaccard jaccard_score clf log log_loss clf mcc matthews_corrcoef clf precision precision_score clf recall recall_score clf zol zero_one_loss reg var explained_variance_score reg max max_error reg var explained_variance_score reg abs mean_absolute_error reg sq mean_squared_error reg rsq root_mean_squared_error reg log mean_squared_log_error reg rlog root_mean_squared_log_error reg medabs median_absolute_error reg poisson mean_poisson_deviance reg gamma mean_gamma_deviance reg per mean_absolute_percentage_error reg d2abs d2_absolute_error_score reg d2pin d2_pinball_score reg d2twe d2_tweedie_score ========= =============== ============================== .. attention:: average value must be valid for sklearn's metrics functions. .. tip:: If metrics is empty, for classification acc, for regression sq is printed out. ==================== ======= ========== ========= Priority (in return) Returns Datatype Condition ==================== ======= ========== ========= 1 scores dictionary always ==================== ======= ========== ========= do_voting __________ It is designed to make voting between y_pred arrays which are created by different models for the same test data. ============ ======== ============= Parameters Datatype Default Value ============ ======== ============= y_pred_list list - combinations list - strategy string avg ============ ======== ============= .. attention:: combinations is the list of selected indexes for y_pred_list lists. It can be created by using do_combinations functions. .. note:: strategy has two valid values: 'avg' and 'mode'. In 'avg' mode, it calculates the mean value and rounds it to an integer value for each prediction. In 'mode' mode, it chooses the most frequent output for each prediction. ==================== ======= ======== ========= Priority (in return) Returns Datatype Condition ==================== ======= ======== ========= 1 results list always ==================== ======= ======== ========= .. note:: It returns the final y_pred arrays inside a list. nth y_pred inside results is always designed for nth index list in combinations. do_combinations _________________ It creates combinations due to requested parameters. ========== ======== ============= Parameters Datatype Default Value ========== ======== ============= indexes list - min_item integer - max_item integer - ========== ======== ============= It makes from min_item selections to max_item selections inside given indexes set. .. tip:: This function might be useful for do_voting function. ==================== ============ ======== ========= Priority (in return) Returns Datatype Condition ==================== ============ ======== ========= 1 combinations list always ==================== ============ ======== ========= examine_time _____________ It measures the training time for the given model. ========== ============================================= ============= Parameters Datatype Default Value ========== ============================================= ============= model any AI model object that has predict function - X_train multidimensional array - y_train 1D array - ========== ============================================= ============= WelkinClassification ______________________ It is a classification algorithm designed by the developer himself. Further information, visit :ref:`welkin` article. These are the functions inside the model: - __init__(strategy='travel', priority=None, limit=None) - fit(X_train, y_train) - predict(X_test) .. note:: strategy has two valid values: 'travel' and 'limit'. .. attention:: priority's datatype is list, limit's datatype is integer. Before changing default settings please read the article mentioned early. DistRegressor _______________ It is a regression algorithm designed by the developer himself. Further information, visit :ref:`dist` article. These are the functions inside the model: - __init__(verbose=True, clf_model=None, clf_params=None, reg_model=None, reg_params=None, efficiency='time', rus=True) - fit(X_train, y_train) - predict(X_test) - is_data_normal(y) .. attention:: Before changing default settings please read the article mentioned early. compare_models ________________ It is designed to compare models on the same data according to requested metrics. ========== ====================== ============= Parameters Datatype Default Value ========== ====================== ============= algo_type string clf algorithms list - metrics list - X_train multidimensional array - y_train 1D array - X_test multidimensional array - y_test 1D array - get_result boolean False ========== ====================== ============= .. note:: algo_type has two valid values: 'clf' for classification and 'reg' for regression. These are the valid keywords for algorithms: ========= ================= =============================================================================== algo_type algorithm keyword class ========= ================= =============================================================================== clf/reg all if the list has it at index zero then it presumes that it contains all keywords clf cat CatBoostClassifier clf ada AdaBoostClassifier clf dtr DecisionTreeClassifier clf raf RandomForestClassifier clf lbm LGBMClassifier clf ext ExtraTreeClassifier clf log LogisticRegression clf knn KNeighborsClassifier clf gnb GaussianNB clf rdg RidgeClassifier clf bnb BernoulliNB clf svc SVC clf per Perceptron clf mnb MultinomialNB reg cat CatBoostRegressor reg ada AdaBoostRegressor reg dtr DecisionTreeRegressor reg raf RandomForestRegressor reg lbm LGBMRegressor reg ext ExtraTreeRegressor reg lin LinearRegression reg knn KNeighborsRegressor reg svr SVR ========= ================= =============================================================================== These are the valid keywords for metrics: ========= =============== ============================== algo_type metrics keyword sklearn function ========= =============== ============================== clf acc accuracy_score clf f1 f1_score clf hamming hamming_loss clf jaccard jaccard_score clf log log_loss clf mcc matthews_corrcoef clf precision precision_score clf recall recall_score clf zol zero_one_loss reg var explained_variance_score reg max max_error reg var explained_variance_score reg abs mean_absolute_error reg sq mean_squared_error reg rsq root_mean_squared_error reg log mean_squared_log_error reg rlog root_mean_squared_log_error reg medabs median_absolute_error reg poisson mean_poisson_deviance reg gamma mean_gamma_deviance reg per mean_absolute_percentage_error reg d2abs d2_absolute_error_score reg d2pin d2_pinball_score reg d2twe d2_tweedie_score ========= =============== ============================== .. note:: The function always prints out the results on the console. ==================== ======= ======== ================== Priority (in return) Returns Datatype Condition ==================== ======= ======== ================== 1 results dict get_result is True ==================== ======= ======== ================== get_best_model ________________ It gets the best model for the requested metric and trains it. This function can be used with dictionary which is obtained by using compare_models. ========== ====================== ============= Parameters Datatype Default Value ========== ====================== ============= scores string clf rel_metric list - algo_type list - X_train multidimensional array - y_train 1D array - behavior string min-best verbose boolean True ========== ====================== ============= .. note:: algo_type has two valid values: 'clf' for classification and 'reg' for regression. .. note:: In order to choose the best model rel_metric is the decisive metric inside the results These are the valid keywords for rel_metric: ========= =============== ============================== algo_type metrics keyword sklearn function ========= =============== ============================== clf acc accuracy_score clf f1 f1_score clf hamming hamming_loss clf jaccard jaccard_score clf log log_loss clf mcc matthews_corrcoef clf precision precision_score clf recall recall_score clf zol zero_one_loss reg var explained_variance_score reg max max_error reg var explained_variance_score reg abs mean_absolute_error reg sq mean_squared_error reg rsq root_mean_squared_error reg log mean_squared_log_error reg rlog root_mean_squared_log_error reg medabs median_absolute_error reg poisson mean_poisson_deviance reg gamma mean_gamma_deviance reg per mean_absolute_percentage_error reg d2abs d2_absolute_error_score reg d2pin d2_pinball_score reg d2twe d2_tweedie_score ========= =============== ============================== .. note:: behavior has two valid values: 'min-best' for minimum score is the best situation and 'max-best' for maximum score is the best situation. subacc _________ It calculates the accuracy score for each class independently. It can also return the actual accuracy score if requested. =========== ======== ============= Parameters Datatype Default Value =========== ======== ============= y_train 1D array - y_pred 1D array - get_general boolean False =========== ======== ============= ==================== ========== ======== =================== Priority (in return) Returns Datatype Condition ==================== ========== ======== =================== 1 accuracies dict always 2 score float get_general is True ==================== ========== ======== =================== get_models ____________ It returns requested models in a dictionary after training them. ========== ====================== ============= Parameters Datatype Default Value ========== ====================== ============= algorithms list - X_train multidimensional array - y_train 1D array - ========== ====================== ============= These are the valid keywords for algorithms: ========= ================= =============================================================================== algo_type algorithm keyword class ========= ================= =============================================================================== clf/reg all if the list has it at index zero then it presumes that it contains all keywords clf cat CatBoostClassifier clf ada AdaBoostClassifier clf dtr DecisionTreeClassifier clf raf RandomForestClassifier clf lbm LGBMClassifier clf ext ExtraTreeClassifier clf log LogisticRegression clf knn KNeighborsClassifier clf gnb GaussianNB clf rdg RidgeClassifier clf bnb BernoulliNB clf svc SVC clf per Perceptron clf mnb MultinomialNB reg cat CatBoostRegressor reg ada AdaBoostRegressor reg dtr DecisionTreeRegressor reg raf RandomForestRegressor reg lbm LGBMRegressor reg ext ExtraTreeRegressor reg lin LinearRegression reg knn KNeighborsRegressor reg svr SVR ========= ================= =============================================================================== ==================== ======= ======== ========= Priority (in return) Returns Datatype Condition ==================== ======= ======== ========= 1 models dict always ==================== ======= ======== ========= commune_create ________________ It declares the way of commune classification for the given dataset. Further information, please read the :ref:`commune` article. ========== ====================== ============= Parameters Datatype Default Value ========== ====================== ============= algorithms list - X_train multidimensional array - y_train 1D array - X_val multidimensional array - y_val 1D array - get_dict boolean False ========== ====================== ============= These are the valid keywords for algorithms: ========= ================= =============================================================================== algo_type algorithm keyword class ========= ================= =============================================================================== clf all if the list has it at index zero then it presumes that it contains all keywords clf cat CatBoostClassifier clf ada AdaBoostClassifier clf dtr DecisionTreeClassifier clf raf RandomForestClassifier clf lbm LGBMClassifier clf ext ExtraTreeClassifier clf log LogisticRegression clf knn KNeighborsClassifier clf gnb GaussianNB clf rdg RidgeClassifier clf bnb BernoulliNB clf svc SVC clf per Perceptron clf mnb MultinomialNB ========= ================= =============================================================================== ==================== =========== ======== ================ Priority (in return) Returns Datatype Condition ==================== =========== ======== ================ 1 y_pred 1D array always 2 declaration dict get_dict is True ==================== =========== ======== ================ commune_apply _______________ It predicts the result due to the given declaration. =========== ====================== ============= Parameters Datatype Default Value =========== ====================== ============= declaration dict - X_test multidimensional array - =========== ====================== ============= .. attention:: declaration can be obtained by using commune_create function. ==================== ======= ======== ========= Priority (in return) Returns Datatype Condition ==================== ======= ======== ========= 1 y_pred 1D array always ==================== ======= ======== ========= find_deflection ________________ It analyses the difference between prediction and actual values for regression problems and returns a report about how successful the prediction was. =============== ================ ============= Parameters Datatype Default Value =============== ================ ============= y_test 1D array - y_pred 1D array - arr boolean True avg boolean False gap integer or float None gap_type string num dif_type string f-i avg_w_abs boolean True success_indexes boolean False =============== ================ ============= These are the valid keywords for gap_type: ======== ====================================================================== gap_type succession condition ======== ====================================================================== exact prediction = actual num actual - gap <= prediction <= actual + gap num+ actual <= prediction <= actual + gap num- actual - gap <= prediction <= actual per (100 - gap) * actual / 100 <= prediction <= (100 + gap) * actual / 100 per+ actual <= prediction <= (100 + gap) * actual / 100 per- (100 - gap) * actual / 100 <= prediction <= actual ======== ====================================================================== ==================== ========= ======== ======================= Priority (in return) Returns Datatype Condition ==================== ========= ======== ======================= 1 diffs list arr is True 2 avg_score float avg is True 3 succ integer gap is not None 4 indexes list success_indexes is True ==================== ========= ======== ======================= .. note:: diffs is the list full of with the differences between actual and predicted values. These are the supported methods for difference calculation: ======== ================================== dif_type calculation ======== ================================== f-i final (prediction) - init (actual) i-f init (actual) - final (prediction) abs absolute ======== ================================== .. note:: avg_score equals the arithmetic mean of the diffs set. .. note:: succ is the amount of the succeeded predictions according to the gap condition. indexes hold the index information, which are successful predictions. get_measure _____________ It provides some other results for the test data in order to measure the quality of the prediction. ============ ============= =============== Parameters Datatype Default Value ============ ============= =============== y_test 1D array - measures string list - y_train 1D array None ============ ============= =============== .. note:: measures may only contain these: 'majority', 'minority', 'random' and 'weighted'. ========== ================================================ Measure Use Case ========== ================================================ majority only the most frequent class minority only the less frequent array random random selection between classes weighted random selection due to frequencies of classes ========== ================================================ ==================== ========= ========== ======================= Priority (in return) Returns Datatype Condition ==================== ========= ========== ======================= 1 arrays dictionary always ==================== ========= ========== =======================