μ½˜ν…μΈ λ‘œ κ±΄λ„ˆλ›°κΈ°

μ°Έμ‘° hub_sdk/modules/datasets.py

μ°Έκ³ 

이 νŒŒμΌμ€ https://github.com/ultralytics/hub-sdk/blob/main/hub_sdk/modules/datasets .pyμ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€. 문제λ₯Ό λ°œκ²¬ν•˜λ©΄ ν’€ λ¦¬ν€˜μŠ€νŠΈ (πŸ› οΈ)λ₯Ό μ œμΆœν•˜μ—¬ 문제λ₯Ό ν•΄κ²°ν•˜λ„λ‘ λ„μ™€μ£Όμ„Έμš”. κ°μ‚¬ν•©λ‹ˆλ‹€ πŸ™!



hub_sdk.modules.datasets.Datasets

베이슀: CRUDClient

CRUD μž‘μ—…μ„ 톡해 λ°μ΄ν„°μ„ΈνŠΈμ™€ μƒν˜Έ μž‘μš©ν•˜κΈ° μœ„ν•œ ν΄λΌμ΄μ–ΈνŠΈλ₯Ό λ‚˜νƒ€λ‚΄λŠ” ν΄λž˜μŠ€μž…λ‹ˆλ‹€. 이 ν΄λž˜μŠ€λŠ” CRUDClient 클래슀λ₯Ό ν™•μž₯ν•˜κ³  λ°μ΄ν„°μ„ΈνŠΈ μž‘μ—…μ„ μœ„ν•œ νŠΉμ • λ©”μ„œλ“œλ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.

속성:

이름 μœ ν˜• μ„€λͺ…
hub_client DatasetUpload

λͺ¨λΈ μ—…λ‘œλ“œμ™€ μƒν˜Έμž‘μš©ν•˜λŠ” 데 μ‚¬μš©λ˜λŠ” λ°μ΄ν„°μ„ΈνŠΈμ—…λ‘œλ“œ μΈμŠ€ν„΄μŠ€μž…λ‹ˆλ‹€.

id (str, None)

데이터 μ§‘ν•©μ˜ 고유 μ‹λ³„μž(μ‚¬μš© κ°€λŠ₯ν•œ 경우)μž…λ‹ˆλ‹€.

data dict

데이터 μ„ΈνŠΈ 데이터λ₯Ό μ €μž₯ν•˜λŠ” μ‚¬μ „μž…λ‹ˆλ‹€.

μ°Έκ³ 

'id' 속성은 μ΄ˆκΈ°ν™” 쀑에 μ„€μ •λ˜λ©° 데이터셋을 κ³ μœ ν•˜κ²Œ μ‹λ³„ν•˜λŠ” 데 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. '데이터' 속성은 APIμ—μ„œ κ°€μ Έμ˜¨ 데이터 μ„ΈνŠΈ 데이터λ₯Ό μ €μž₯ν•˜λŠ” 데 μ‚¬μš©λ©λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
class Datasets(CRUDClient):
    """
    A class representing a client for interacting with Datasets through CRUD operations. This class extends the
    CRUDClient class and provides specific methods for working with Datasets.

    Attributes:
        hub_client (DatasetUpload): An instance of DatasetUpload used for interacting with model uploads.
        id (str, None): The unique identifier of the dataset, if available.
        data (dict): A dictionary to store dataset data.

    Note:
        The 'id' attribute is set during initialization and can be used to uniquely identify a dataset.
        The 'data' attribute is used to store dataset data fetched from the API.
    """

    def __init__(self, dataset_id: Optional[str] = None, headers: Optional[Dict[str, Any]] = None):
        """
        Initialize a Datasets client.

        Args:
            dataset_id (str): Unique id of the dataset.
            headers (dict, optional): Headers to include in HTTP requests.
        """
        super().__init__("datasets", "dataset", headers)
        self.hub_client = DatasetUpload(headers)
        self.id = dataset_id
        self.data = {}
        if dataset_id:
            self.get_data()

    def get_data(self) -> None:
        """
        Retrieves data for the current dataset instance.

        If a valid dataset ID has been set, it sends a request to fetch the dataset data and stores it in the instance.
        If no dataset ID has been set, it logs an error message.

        Returns:
            (None): The method does not return a value.
        """
        if not self.id:
            self.logger.error("No dataset id has been set. Update the dataset id or create a dataset.")
            return

        try:
            response = super().read(self.id)

            if response is None:
                self.logger.error(f"Received no response from the server for dataset ID: {self.id}")
                return

            # Check if the response has a .json() method (it should if it's a response object)
            if not hasattr(response, "json"):
                self.logger.error(f"Invalid response object received for dataset ID: {self.id}")
                return

            resp_data = response.json()
            if resp_data is None:
                self.logger.error(f"No data received in the response for dataset ID: {self.id}")
                return

            self.data = resp_data.get("data", {})
            self.logger.debug(f"Dataset data retrieved for ID: {self.id}")

        except Exception as e:
            self.logger.error(f"An error occurred while retrieving data for dataset ID: {self.id}, {e}")

    def create_dataset(self, dataset_data: dict) -> None:
        """
        Creates a new dataset with the provided data and sets the dataset ID for the current instance.

        Args:
            dataset_data (dict): A dictionary containing the data for creating the dataset.

        Returns:
            (None): The method does not return a value.
        """
        resp = super().create(dataset_data).json()
        self.id = resp.get("data", {}).get("id")
        self.get_data()

    def delete(self, hard: bool = False) -> Optional[Response]:
        """
        Delete the dataset resource represented by this instance.

        Args:
            hard (bool, optional): If True, perform a hard delete.

        Note:
            The 'hard' parameter determines whether to perform a soft delete (default) or a hard delete.
            In a soft delete, the dataset might be marked as deleted but retained in the system.
            In a hard delete, the dataset is permanently removed from the system.

        Returns:
            (Optional[Response]): Response object from the delete request, or None if delete fails.
        """
        return super().delete(self.id, hard)

    def update(self, data: dict) -> Optional[Response]:
        """
        Update the dataset resource represented by this instance.

        Args:
            data (dict): The updated data for the dataset resource.

        Returns:
            (Optional[Response]): Response object from the update request, or None if update fails.
        """
        return super().update(self.id, data)

    def upload_dataset(self, file: str = None) -> Optional[Response]:
        """
        Uploads a dataset file to the hub.

        Args:
            file (str, optional): The path to the dataset file to upload.

        Returns:
            (Optional[Response]): Response object from the upload request, or None if upload fails.
        """
        return self.hub_client.upload_dataset(self.id, file)

    def get_download_link(self) -> Optional[str]:
        """
        Get dataset download link.

        Returns:
            (Optional[str]): Return download link or None if the link is not available.
        """
        return self.data.get("url")

__init__(dataset_id=None, headers=None)

데이터 집합 ν΄λΌμ΄μ–ΈνŠΈλ₯Ό μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.

λ§€κ°œλ³€μˆ˜:

이름 μœ ν˜• μ„€λͺ… κΈ°λ³Έκ°’
dataset_id str

데이터 μ„ΈνŠΈμ˜ 고유 IDμž…λ‹ˆλ‹€.

None
headers dict

HTTP μš”μ²­μ— 포함할 ν—€λ”μž…λ‹ˆλ‹€.

None
의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def __init__(self, dataset_id: Optional[str] = None, headers: Optional[Dict[str, Any]] = None):
    """
    Initialize a Datasets client.

    Args:
        dataset_id (str): Unique id of the dataset.
        headers (dict, optional): Headers to include in HTTP requests.
    """
    super().__init__("datasets", "dataset", headers)
    self.hub_client = DatasetUpload(headers)
    self.id = dataset_id
    self.data = {}
    if dataset_id:
        self.get_data()

create_dataset(dataset_data)

제곡된 λ°μ΄ν„°λ‘œ μƒˆ 데이터 집합을 μƒμ„±ν•˜κ³  ν˜„μž¬ μΈμŠ€ν„΄μŠ€μ— λŒ€ν•œ 데이터 집합 IDλ₯Ό μ„€μ •ν•©λ‹ˆλ‹€.

λ§€κ°œλ³€μˆ˜:

이름 μœ ν˜• μ„€λͺ… κΈ°λ³Έκ°’
dataset_data dict

데이터 집합을 λ§Œλ“€κΈ° μœ„ν•œ 데이터가 ν¬ν•¨λœ μ‚¬μ „μž…λ‹ˆλ‹€.

ν•„μˆ˜

λ°˜ν™˜ν•©λ‹ˆλ‹€:

μœ ν˜• μ„€λͺ…
None

이 λ©”μ„œλ“œλŠ” 값을 λ°˜ν™˜ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def create_dataset(self, dataset_data: dict) -> None:
    """
    Creates a new dataset with the provided data and sets the dataset ID for the current instance.

    Args:
        dataset_data (dict): A dictionary containing the data for creating the dataset.

    Returns:
        (None): The method does not return a value.
    """
    resp = super().create(dataset_data).json()
    self.id = resp.get("data", {}).get("id")
    self.get_data()

delete(hard=False)

이 μΈμŠ€ν„΄μŠ€λ‘œ ν‘œμ‹œλ˜λŠ” 데이터 μ„ΈνŠΈ λ¦¬μ†ŒμŠ€λ₯Ό μ‚­μ œν•©λ‹ˆλ‹€.

λ§€κ°œλ³€μˆ˜:

이름 μœ ν˜• μ„€λͺ… κΈ°λ³Έκ°’
hard bool

True이면 ν•˜λ“œ μ‚­μ œλ₯Ό μˆ˜ν–‰ν•©λ‹ˆλ‹€.

False
μ°Έκ³ 

'hard' λ§€κ°œλ³€μˆ˜λŠ” μ†Œν”„νŠΈ μ‚­μ œ(κΈ°λ³Έκ°’)λ₯Ό μˆ˜ν–‰ν• μ§€, μ•„λ‹ˆλ©΄ ν•˜λ“œ μ‚­μ œλ₯Ό μˆ˜ν–‰ν• μ§€λ₯Ό κ²°μ •ν•©λ‹ˆλ‹€. μ†Œν”„νŠΈ μ‚­μ œμ˜ 경우 데이터 μ„ΈνŠΈλŠ” μ‚­μ œλœ κ²ƒμœΌλ‘œ ν‘œμ‹œλ˜μ§€λ§Œ μ‹œμŠ€ν…œμ—λŠ” μœ μ§€λ  수 μžˆμŠ΅λ‹ˆλ‹€. ν•˜λ“œ μ‚­μ œμ—μ„œλŠ” 데이터 μ„ΈνŠΈκ°€ μ‹œμŠ€ν…œμ—μ„œ 영ꡬ적으둜 μ œκ±°λ©λ‹ˆλ‹€.

λ°˜ν™˜ν•©λ‹ˆλ‹€:

μœ ν˜• μ„€λͺ…
Optional[Response]

μ‚­μ œ μš”μ²­μ˜ 응닡 객체, μ‚­μ œμ— μ‹€νŒ¨ν•œ 경우 μ—†μŒμž…λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def delete(self, hard: bool = False) -> Optional[Response]:
    """
    Delete the dataset resource represented by this instance.

    Args:
        hard (bool, optional): If True, perform a hard delete.

    Note:
        The 'hard' parameter determines whether to perform a soft delete (default) or a hard delete.
        In a soft delete, the dataset might be marked as deleted but retained in the system.
        In a hard delete, the dataset is permanently removed from the system.

    Returns:
        (Optional[Response]): Response object from the delete request, or None if delete fails.
    """
    return super().delete(self.id, hard)

get_data()

ν˜„μž¬ 데이터 집합 μΈμŠ€ν„΄μŠ€μ— λŒ€ν•œ 데이터λ₯Ό κ²€μƒ‰ν•©λ‹ˆλ‹€.

μœ νš¨ν•œ λ°μ΄ν„°μ„ΈνŠΈ IDκ°€ μ„€μ •λ˜μ–΄ 있으면 λ°μ΄ν„°μ„ΈνŠΈ 데이터λ₯Ό κ°€μ Έμ˜€κΈ° μœ„ν•œ μš”μ²­μ„ μ „μ†‘ν•˜κ³  μΈμŠ€ν„΄μŠ€μ— μ €μž₯ν•©λ‹ˆλ‹€. λ°μ΄ν„°μ„ΈνŠΈ IDκ°€ μ„€μ •λ˜μ§€ μ•Šμ€ 경우 였λ₯˜ λ©”μ‹œμ§€λ₯Ό κΈ°λ‘ν•©λ‹ˆλ‹€.

λ°˜ν™˜ν•©λ‹ˆλ‹€:

μœ ν˜• μ„€λͺ…
None

이 λ©”μ„œλ“œλŠ” 값을 λ°˜ν™˜ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def get_data(self) -> None:
    """
    Retrieves data for the current dataset instance.

    If a valid dataset ID has been set, it sends a request to fetch the dataset data and stores it in the instance.
    If no dataset ID has been set, it logs an error message.

    Returns:
        (None): The method does not return a value.
    """
    if not self.id:
        self.logger.error("No dataset id has been set. Update the dataset id or create a dataset.")
        return

    try:
        response = super().read(self.id)

        if response is None:
            self.logger.error(f"Received no response from the server for dataset ID: {self.id}")
            return

        # Check if the response has a .json() method (it should if it's a response object)
        if not hasattr(response, "json"):
            self.logger.error(f"Invalid response object received for dataset ID: {self.id}")
            return

        resp_data = response.json()
        if resp_data is None:
            self.logger.error(f"No data received in the response for dataset ID: {self.id}")
            return

        self.data = resp_data.get("data", {})
        self.logger.debug(f"Dataset data retrieved for ID: {self.id}")

    except Exception as e:
        self.logger.error(f"An error occurred while retrieving data for dataset ID: {self.id}, {e}")

데이터 μ„ΈνŠΈ λ‹€μš΄λ‘œλ“œ 링크 λ°›κΈ°.

λ°˜ν™˜ν•©λ‹ˆλ‹€:

μœ ν˜• μ„€λͺ…
Optional[str]

링크λ₯Ό μ‚¬μš©ν•  수 μ—†λŠ” 경우 λ‹€μš΄λ‘œλ“œ 링크λ₯Ό λ°˜ν™˜ν•˜κ±°λ‚˜ μ—†μŒμœΌλ‘œ ν‘œμ‹œν•©λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def get_download_link(self) -> Optional[str]:
    """
    Get dataset download link.

    Returns:
        (Optional[str]): Return download link or None if the link is not available.
    """
    return self.data.get("url")

update(data)

이 μΈμŠ€ν„΄μŠ€κ°€ λ‚˜νƒ€λ‚΄λŠ” 데이터 μ„ΈνŠΈ λ¦¬μ†ŒμŠ€λ₯Ό μ—…λ°μ΄νŠΈν•©λ‹ˆλ‹€.

λ§€κ°œλ³€μˆ˜:

이름 μœ ν˜• μ„€λͺ… κΈ°λ³Έκ°’
data dict

데이터 집합 λ¦¬μ†ŒμŠ€μ— λŒ€ν•œ μ—…λ°μ΄νŠΈλœ λ°μ΄ν„°μž…λ‹ˆλ‹€.

ν•„μˆ˜

λ°˜ν™˜ν•©λ‹ˆλ‹€:

μœ ν˜• μ„€λͺ…
Optional[Response]

μ—…λ°μ΄νŠΈ μš”μ²­μ˜ 응닡 객체, λ˜λŠ” μ—…λ°μ΄νŠΈμ— μ‹€νŒ¨ν•œ 경우 μ—†μŒμž…λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def update(self, data: dict) -> Optional[Response]:
    """
    Update the dataset resource represented by this instance.

    Args:
        data (dict): The updated data for the dataset resource.

    Returns:
        (Optional[Response]): Response object from the update request, or None if update fails.
    """
    return super().update(self.id, data)

upload_dataset(file=None)

ν—ˆλΈŒμ— 데이터 μ„ΈνŠΈ νŒŒμΌμ„ μ—…λ‘œλ“œν•©λ‹ˆλ‹€.

λ§€κ°œλ³€μˆ˜:

이름 μœ ν˜• μ„€λͺ… κΈ°λ³Έκ°’
file str

μ—…λ‘œλ“œν•  데이터 집합 파일의 κ²½λ‘œμž…λ‹ˆλ‹€.

None

λ°˜ν™˜ν•©λ‹ˆλ‹€:

μœ ν˜• μ„€λͺ…
Optional[Response]

μ—…λ‘œλ“œ μš”μ²­μ˜ 응닡 객체, μ—…λ‘œλ“œμ— μ‹€νŒ¨ν•œ 경우 μ—†μŒμž…λ‹ˆλ‹€.

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def upload_dataset(self, file: str = None) -> Optional[Response]:
    """
    Uploads a dataset file to the hub.

    Args:
        file (str, optional): The path to the dataset file to upload.

    Returns:
        (Optional[Response]): Response object from the upload request, or None if upload fails.
    """
    return self.hub_client.upload_dataset(self.id, file)



hub_sdk.modules.datasets.DatasetList

베이슀: PaginatedList

의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
class DatasetList(PaginatedList):
    def __init__(self, page_size=None, public=None, headers=None):
        """
        Initialize a Dataset instance.

        Args:
            page_size (int, optional): The number of items to request per page.
            public (bool, optional): Whether the items should be publicly accessible.
            headers (dict, optional): Headers to be included in API requests.
        """
        base_endpoint = "datasets"
        super().__init__(base_endpoint, "dataset", page_size, public, headers)

__init__(page_size=None, public=None, headers=None)

데이터 집합 μΈμŠ€ν„΄μŠ€λ₯Ό μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.

λ§€κ°œλ³€μˆ˜:

이름 μœ ν˜• μ„€λͺ… κΈ°λ³Έκ°’
page_size int

νŽ˜μ΄μ§€λ‹Ή μš”μ²­ν•  ν•­λͺ© μˆ˜μž…λ‹ˆλ‹€.

None
public bool

ν•­λͺ©μ΄ 곡개적으둜 μ•‘μ„ΈμŠ€ν•  수 μžˆλŠ”μ§€ μ—¬λΆ€μž…λ‹ˆλ‹€.

None
headers dict

API μš”μ²­μ— 포함할 ν—€λ”μž…λ‹ˆλ‹€.

None
의 μ†ŒμŠ€ μ½”λ“œ hub_sdk/modules/datasets.py
def __init__(self, page_size=None, public=None, headers=None):
    """
    Initialize a Dataset instance.

    Args:
        page_size (int, optional): The number of items to request per page.
        public (bool, optional): Whether the items should be publicly accessible.
        headers (dict, optional): Headers to be included in API requests.
    """
    base_endpoint = "datasets"
    super().__init__(base_endpoint, "dataset", page_size, public, headers)