Skip to main content
Version: 0.5.x

Frequently Asked Questions

Quick solutions for possible questions about aisp.

General usage

Which algorithm should I choose?

It depends on the type of problem:

  • Anomaly detection: Use RNSA or BNSA.
    • RNSA for problems with continuous data.
    • BNSA for problems with binary data.
  • Classification: Use AIRS, RNSA, or BNSA.
    • RNSA and BNSA were implemented to be applied to multiclass classification.
    • AIRS is more robust to noise in the data.
  • Optimization: Use Clonalg.
    • The implementation can be applied to objective function optimization (min/max).
  • Clustering: Use AiNet.
    • Automatically separates data into groups.
    • Does not require a predefined number of clusters.

How do I normalize my data to use the RNSA algorithm?

RNSA works exclusively with data normalized in the range (0.0, 1.0). Therefore, before applying it, the data must be normalized if they are not already in this range. A simple way to do this is by using normalization tools from scikit-learn, such as MinMaxScaler.

Example

In this example, X represents the non-normalized input data.

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

x_norm = scaler.fit_transform(X)

# Training the model with normalized data
rnsa = RNSA(N=100, r=0.1)
rnsa.fit(x_norm, y)

Parameter configuration

How do I choose the number of detectors (N) in RNSA or BNSA?

The number of detectors directly affects performance:

  • A small number of detectors may not adequately cover the non-self space.
  • A very large number of detectors may increase training time and can cause overfitting.

Recommendations:

  • Test different values for the number of detectors until you find a suitable balance between training time and model performance.
  • Use cross-validation to identify the value that consistently yields the best results.

Which radius (r or aff_thresh) should I use in BNSA or RNSA?

The detector radius depends on the data distribution:

  • A very small radius may fail to detect anomalies.
  • A very large radius may overlap the self space and never generate valid detectors.

What is the r_s parameter in RNSA?

r_s is the radius of the self sample. It defines a region around each training sample.


Clonalg: How do I define the objective function?

The objective function must follow the pattern of the base class. It must receive a solution as input and return a cost (or affinity) value.

def affinity_function(self, solution: Any) -> float:
pass

There are two ways to define the objective function in Clonalg.

def sphere(solution):
return np.sum(solution ** 2)
  1. Defining the function directly in the class constructor
clonalg = Clonalg(
problem_size=2,
affinity_function=sphere
)
  1. Using the function registry
clonalg = Clonalg(
problem_size=2,
)

clonalg.register("affinity_function", sphere)

Additional information

Where can I find more examples?

How can I contribute to the project?

See the Contribution Guide on GitHub.

Still have questions?