Frequently Asked Questions
Quick solutions for possible questions about aisp.
General usage
Which algorithm should I choose?
It depends on the type of problem:
- Anomaly detection: Use
RNSAorBNSA.- RNSA for problems with continuous data.
- BNSA for problems with binary data.
- Classification: Use
AIRS,RNSA, orBNSA.RNSAandBNSAwere implemented to be applied to multiclass classification.AIRSis more robust to noise in the data.
- Optimization: Use
Clonalg.- The implementation can be applied to objective function optimization (min/max).
- Clustering: Use
AiNet.- Automatically separates data into groups.
- Does not require a predefined number of clusters.
How do I normalize my data to use the RNSA algorithm?
RNSA works exclusively with data normalized in the range [0, 1]. Therefore, before applying it, the data must be normalized if they are not already in this range. A simple way to do this is by using normalization tools from scikit-learn, such as MinMaxScaler.
Example
In this example, X represents the non-normalized input data.
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
x_norm = scaler.fit_transform(X)
# Training the model with normalized data
rnsa = RNSA(N=100, r=0.1)
rnsa.fit(x_norm, y)
Parameter configuration
How do I choose the number of detectors (N) in RNSA or BNSA?
The number of detectors directly affects performance:
- A small number of detectors may not adequately cover the non-self space.
- A very large number of detectors may increase training time and can cause overfitting.
Recommendations:
- Test different values for the number of detectors until you find a suitable balance between training time and model performance.
- Use cross-validation to identify the value that consistently yields the best results.
Which radius (r or aff_thresh) should I use in BNSA or RNSA?
The detector radius depends on the data distribution:
- A very small radius may fail to detect anomalies.
- A very large radius may overlap the self space and never generate valid detectors.
What is the r_s parameter in RNSA?
r_s is the radius of the self sample. It defines a region around each training sample.
Clonalg: How do I define the objective function?
The objective function must follow the pattern of the base class. It must receive a solution as input and return a cost (or affinity) value.
def affinity_function(self, solution: Any) -> float:
pass
There are two ways to define the objective function in Clonalg.
- Defining the function directly in the class constructor
def sphere(solution):
return np.sum(solution *- 2)
clonalg = Clonalg(
problem_size=2,
affinity_function=sphere
)
- Using the function registry
def sphere(solution):
return np.sum(solution *- 2)
clonalg = Clonalg(
problem_size=2,
)
clonalg.register("affinity_function", sphere)
Additional information
Where can I find more examples?
How can I contribute to the project?
See the Contribution Guide on GitHub.
Still have questions?
- Open an Issue on GitHub