Version: 0.5.x

Frequently Asked Questions

Quick solutions for possible questions about aisp.

General usage

Which algorithm should I choose?

It depends on the type of problem:

Anomaly detection: Use RNSA or BNSA.
- RNSA for problems with continuous data.
- BNSA for problems with binary data.
Classification: Use AIRS, RNSA, or BNSA.
- RNSA and BNSA were implemented to be applied to multiclass classification.
- AIRS is more robust to noise in the data.
Optimization: Use Clonalg.
- The implementation can be applied to objective function optimization (min/max).
Clustering: Use AiNet.
- Automatically separates data into groups.
- Does not require a predefined number of clusters.

How do I normalize my data to use the `RNSA` algorithm?

RNSA works exclusively with data normalized in the range [0, 1]. Therefore, before applying it, the data must be normalized if they are not already in this range. A simple way to do this is by using normalization tools from scikit-learn, such as MinMaxScaler.

Example

In this example, X represents the non-normalized input data.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

x_norm = scaler.fit_transform(X)

# Training the model with normalized data
rnsa = RNSA(N=100, r=0.1)
rnsa.fit(x_norm, y)

Parameter configuration

How do I choose the number of detectors (`N`) in `RNSA` or `BNSA`?

The number of detectors directly affects performance:

A small number of detectors may not adequately cover the non-self space.
A very large number of detectors may increase training time and can cause overfitting.

Recommendations:

Test different values for the number of detectors until you find a suitable balance between training time and model performance.
Use cross-validation to identify the value that consistently yields the best results.

Which radius (`r` or `aff_thresh`) should I use in `BNSA` or `RNSA`?

The detector radius depends on the data distribution:

A very small radius may fail to detect anomalies.
A very large radius may overlap the self space and never generate valid detectors.

What is the `r_s` parameter in `RNSA`?

r_s is the radius of the self sample. It defines a region around each training sample.

Clonalg: How do I define the objective function?

The objective function must follow the pattern of the base class. It must receive a solution as input and return a cost (or affinity) value.

def affinity_function(self, solution: Any) -> float:
    pass

There are two ways to define the objective function in Clonalg.

Defining the function directly in the class constructor

def sphere(solution):
    return np.sum(solution *- 2)

clonalg = Clonalg(
    problem_size=2,
    affinity_function=sphere
)

Using the function registry

def sphere(solution):
    return np.sum(solution *- 2)

clonalg = Clonalg(
    problem_size=2,
)

clonalg.register("affinity_function", sphere)

Additional information

Where can I find more examples?

Examples in the documentation.
Examples on GitHub

How can I contribute to the project?

See the Contribution Guide on GitHub.

Still have questions?

Open an Issue on GitHub

General usage​

Which algorithm should I choose?​

How do I normalize my data to use the RNSA algorithm?​

Example​

Parameter configuration​

How do I choose the number of detectors (N) in RNSA or BNSA?​

Which radius (r or aff_thresh) should I use in BNSA or RNSA?​

What is the r_s parameter in RNSA?​

Clonalg: How do I define the objective function?​

Additional information​

Where can I find more examples?​

How can I contribute to the project?​

Still have questions?​