[Active Learning] 데이터 Pre-training, 학습 진행

728x90

동양인 데이터로 Pre-training된 모델로 Active Learning 진행
1. pre-training모델은 찾을 예정
2. 데이터셋은 추가 수집 예정
우리 데이터셋을 보완해 Active Learning 진행

Pre-trained dataset

Category: female, Age group: 0, Number of images: 20
Category: female, Age group: 10, Number of images: 12799
Category: female, Age group: 20, Number of images: 38118
Category: female, Age group: 30, Number of images: 11162
Category: female, Age group: 40, Number of images: 484
Category: female, Age group: 50, Number of images: 101
Category: female, Age group: 60, Number of images: 38
Category: female, Age group: 70, Number of images: 43
Category: female, Age group: 80, Number of images: 3
Category: male, Age group: 0, Number of images: 14
Category: male, Age group: 10, Number of images: 10336
Category: male, Age group: 20, Number of images: 60009
Category: male, Age group: 30, Number of images: 28315
Category: male, Age group: 40, Number of images: 1597
Category: male, Age group: 50, Number of images: 132
Category: male, Age group: 60, Number of images: 89
Category: male, Age group: 70, Number of images: 17
Category: male, Age group: 80, Number of images: 4

<Active Learning Code>
랜덤한 10000장

import os
import torch
import numpy as np
from torchvision import models, transforms
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import SpectralClustering
from PIL import Image
from scipy.spatial.distance import cdist
import shutil

# 이미지 전처리 함수
preprocess = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

def extract_features(img_path, model):
    img = Image.open(img_path).convert('RGB')
    img_tensor = preprocess(img)
    img_tensor = img_tensor.unsqueeze(0)  # 배치 차원 추가
    with torch.no_grad():
        features = model(img_tensor).numpy()
    return features

# 특징 추출
def extract_all_features(data_dir, model):
    features_list = []
    image_paths = []

    for filename in os.listdir(data_dir):
        if filename.endswith('.jpg') or filename.endswith('.png'):
            img_path = os.path.join(data_dir, filename)
            image_paths.append(img_path)
            features = extract_features(img_path, model)
            features_list.append(features)

    return features_list, image_paths

# 클러스터링을 위한 함수 정의
def perform_clustering(features_list):
    # 특징 벡터를 배열로 변환
    features_array = np.vstack(features_list)

    # Standard Scaler로 스케일링
    scaler = StandardScaler()
    features_scaled = scaler.fit_transform(features_array)

    # 나이대를 기반으로 3개의 클러스터로 Spectral Clustering 수행
    n_clusters = 2
    spectral = SpectralClustering(n_clusters=n_clusters, affinity='nearest_neighbors')
    labels = spectral.fit_predict(features_scaled)

    return labels

# 주어진 이미지 파일이 저장된 디렉토리 경로
data_dir = r'C:\Users\Catholic\Desktop\DL_Project\Cluster_Data\70'

# VGG16 모델 불러오기 (ImageNet 사전 학습된 모델 사용)
model = models.vgg16(pretrained=True)
model.classifier = torch.nn.Sequential(*list(model.classifier.children())[:-1])  # 마지막 FC 레이어 제거
model.eval()

# 이미지 특징 추출
features_list, image_paths = extract_all_features(data_dir, model)

# 클러스터링 수행
labels = perform_clustering(features_list)

# 클러스터링 결과 출력
print("클러스터링 결과:")
for idx, label in enumerate(labels):
    print(f"이미지: {image_paths[idx]}, 클러스터: {label}")

# 클러스터별로 이미지를 저장할 폴더 생성
output_dir = r'C:\Users\Catholic\Desktop\DL_Project\Cluster'
os.makedirs(output_dir, exist_ok=True)

# 각 클러스터별로 이미지를 해당 클러스터 번호 폴더에 복사
for idx, label in enumerate(labels):
    cluster_dir = os.path.join(output_dir, f'cluster_{label}')
    os.makedirs(cluster_dir, exist_ok=True)
    shutil.copy(image_paths[idx], cluster_dir)

20대 사진 갯수: 3973장
70대 사진 갯수: 3114장

20대 사진 갯수: 100장
70대 사진 갯수: 886장

10대 사진 갯수: 22장
40대 사진 갯수: 27장
70대 사진 갯수: 23장

10대 사진 갯수: 0장
40대 사진 갯수: 0장
70대 사진 갯수: 35장

10대 사진 갯수: 78장
40대 사진 갯수: 73장
70대 사진 갯수: 42장

mobilenvet

10대 사진 갯수: 26장
40대 사진 갯수: 52장
70대 사진 갯수: 33장

10대 사진 갯수: 11장
40대 사진 갯수: 9장
70대 사진 갯수: 40장

10대 사진 갯수: 63장
40대 사진 갯수: 39장
70대 사진 갯수: 27장

<20vs 70>
20대 사진 갯수: 2359장
70대 사진 갯수: 1721장

20대 사진 갯수: 1641장
70대 사진 갯수: 2279장

ResNet50

728x90

저작자표시 비영리 변경금지 (새창열림)

'AI > Computer Vision' 카테고리의 다른 글

[Yolov10] Custom Data 실습 코드 (1)	2024.11.18
[Vision Transformer(ViT)] 코드 설명 및 인자 정리 (0)	2024.09.23
[실전! 컴퓨터 비전을 위한 머신러닝] 06. 전처리 (1)	2023.10.28
[실전! 컴퓨터 비전을 위한 머신러닝] 04 객체 검출과 이미지 세분화 (1)	2023.10.08
[실전! 컴퓨터 비전을 위한 머신러닝] 03. 이미지 비전 (2) (0)	2023.10.01

민도리의 공부

[Active Learning] 데이터 Pre-training, 학습 진행

'AI > Computer Vision' 카테고리의 다른 글

티스토리툴바

[Active Learning] 데이터 Pre-training, 학습 진행

'AI > Computer Vision' 카테고리의 다른 글

'AI/Computer Vision' Related Articles

티스토리툴바