이상치 감지를 위한 Depth-Based Method

Depth-Based method에서는 convex hull analysis가 outlier find에 이용됩니다. 이는 데이터의 outer boundaries에 있는 points는 convex hull corner에 있을 것이라는 idea로 부터 나온 방법입니다.

Depth-based algorithm은 iterative한 방식으로 작동합니다.

k th 반복 째에, data set의 convex hull의 corner에 있는 모든 점들은 data set에서 지워지고, 이러한 points를 depth k로 지정합니다.
이러한 단계를 data set이 empty할 때까지 반복합니다.
$r$까지의 depth에 있는 모든 points을 outlier 로 볼 수 있습니다. 즉, data point의 depth를 outlier score로 볼 수 있습니다.

Depth-Based method에서는 다음 그림에서의 Point B과 같은 multivariate extreme value를 detect 하기 좋으나, Point A같은 inner regions of the data space에 있는 outlier 를 발견하는데는 좋지 않습니다.

dimensionality에 따라 computational complexity of convex-hull methods는 기하급수적으로 증가합니다.
convex hull corner에 있는 points 수는 data dimensionality와 exponentially하게 관계가 있기 때문에, dimensionality 가 증가하면서 data points의 많은 부분이 convex hulld의 corner에 존재합니다.

→ computationally impractical하며, 차원이 늘어남에 따라 depth 가 바깥 족에서 촘촘하게 생기므로 비효율적입니다.

* 이 글은 Charu C. Aggarwal 의 Outlier Analysis Second Edition을 정리한 글입니다.

Isolation Forests for Outlier Detection (0)	2021.04.30
ROC Curve 설명(해석) 및 그리기(구현)-Python (2)	2021.02.28
Precision-Recall Curves 설명 및 그리기(구현)-Python (2)	2021.02.27
Box plot 정리 (0)	2021.02.25
Stutent's t-distribution ( t분포) (0)	2021.02.25

All IS WELL