MIset Package

MIset class

class MIset.MIset(max_features=1, variant='jmim', verbose=False, n_jobs=None)[source]

Bases: object

This is the class representation of this library. Invoke the methods of this class to perform feature selection on your dataset.

Parameters:

max_features (int) – Choose maximum count of important features to be given by the feature selection method, defaults to 1.
variant (str) –
Choose which feature selection method must be used. The following options are available:
- ’jmim’ : ‘Joint Mutual Information Maximization’ method as described in this paper.
- ’njmim’ : ‘Normalized Joint Mutual Information Maximization’ method as described in this paper.
- ’jomic’ : ‘Joint Mutual Information with Class Relevance’ method as described in this paper.
verbose (bool, optional) – Choose whether to print messages to show feature selection progress. A message is printed once every most relevant feature is found, parameter defaults to False.
n_jobs (int, optional) – The number of jobs to use while computing the feature selection method. Passing -1 means using all processors. Parallelization is done via ‘joblib’.

fit(df, feature_list, class_feature_name)[source]

Fit the feature selection algorithm on your dataset.

Parameters:

df (Pandas DataFrame) – Pandas DataFrame
feature_list (list) – List of column names of the DataFrame on which feature selection is to be performed.
class_feature_name (str) – Name of the column containing your target variable.

Returns:

Returns None

Return type:

None

top_features()[source]

Get a list of feature names deemed the most important by the feature selection algorithm.

Each entry in the list represents the most important feature selected during that iteration. For example, the first index of the list is the most important feature in the first iteration, the second index of the list is the most important feature in the second iteration and so on.

Returns:: Returns the list of most important features.
Return type:: list[str]

feature_scores()[source]

Get a dictionary where the key is the feature name and the value is its feature importance score as computed by your selected algorithm.

Returns:: Returns a dictionary of feature scores.
Return type:: dict

feature_selection_order()[source]

Get a dictionary which provides information on which feature was deemed as most important at each iteration.

The key of the dictionary is the iteration number while the value of the dictionary is the most important feature according to the feature selection algorithm in that iteration.

Returns:: Returns a dictionary of most important feature at each iteration order.
Return type:: dict