API Reference
Core async functions.
download #
download(
urls,
file_paths,
*,
chunk_size=CHUNK_SIZE,
limit_per_host=MAX_HOSTS,
timeout=600,
ssl=True,
raise_status=True,
retries=MAX_RETRIES,
)
Download multiple files concurrently by streaming their content to disk.
Parameters:
-
urls(list of str) –URLs to download.
-
file_paths(list of pathlib.Path) –Paths to save the downloaded files.
-
chunk_size(int, default:CHUNK_SIZE) –Size of the chunks to download, by default 1 MB.
-
limit_per_host(int, default:MAX_HOSTS) –Maximum number of concurrent connections per host, by default 4.
-
timeout(int, default:600) –Request timeout in seconds, by default 10 minutes.
-
ssl(bool or SSLContext, default:True) –SSL configuration for the requests, by default True.
-
raise_status(bool, default:True) –Raise an exception if a request fails, by default True. Otherwise, the exception is logged and the function continues.
-
retries(int, default:MAX_RETRIES) –Number of retry attempts for transient errors, by default 3.
Raises:
-
InputTypeError–If urls and file_paths are not lists of the same size.
-
ServiceError–If the request fails or response cannot be processed.
check_downloads #
check_downloads(
urls,
file_paths,
*,
limit_per_host=MAX_HOSTS,
timeout=600,
ssl=True,
retries=MAX_RETRIES,
)
Check whether existing downloaded files match the remote file size.
Only files that already exist on disk are checked. Files that
are missing or whose local size matches the remote Content-Length
are not included in the returned dictionary.
.. note::
Some servers do not provide a ``Content-Length`` header. In
those cases the remote size cannot be determined and the
corresponding file is silently skipped (treated as valid).
Parameters:
-
urls(str or list of str) –URLs corresponding to the downloaded files.
-
file_paths(pathlib.Path, str, or list of those) –Local paths to the downloaded files.
-
limit_per_host(int, default:MAX_HOSTS) –Maximum number of concurrent connections per host, by default 4.
-
timeout(float, default:600) –Request timeout in seconds, by default 10 minutes.
-
ssl(bool or SSLContext, default:True) –SSL configuration for the requests, by default True.
-
retries(int, default:MAX_RETRIES) –Number of retry attempts for transient errors, by default 3.
Returns:
-
dict of pathlib.Path to int–Mapping of file paths whose local size does not match the remote size to the expected remote size. An empty dictionary means all existing files are valid.
Raises:
-
InputTypeError–If
urlsandfile_pathsare not lists of the same size.
unique_filename #
Generate a unique filename using SHA-256 from a query.
Parameters:
-
url(str) –The URL for the request.
-
params((dict, MultiDict), default:None) –Query parameters for the request, default is
None. -
data((dict, str), default:None) –Data or JSON to include in the hash, default is
None. -
prefix(str, default:None) –A custom prefix to attach to the filename, default is
None. -
file_extension(str, default:'') –The file extension to append to the filename, default is
"".
Returns:
-
str–A unique filename with the SHA-256 hash, optional prefix, and the file extension.
fetch #
fetch(
urls: StrOrURL,
return_type: Literal["text"],
*,
request_method: RequestMethod = "get",
request_kwargs: dict[str, Any] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[False],
retries: int = ...,
) -> str | None
fetch(
urls: StrOrURL,
return_type: Literal["text"],
*,
request_method: RequestMethod = "get",
request_kwargs: dict[str, Any] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[True] = True,
retries: int = ...,
) -> str
fetch(
urls: StrOrURL,
return_type: Literal["json"],
*,
request_method: RequestMethod = "get",
request_kwargs: dict[str, Any] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[False],
retries: int = ...,
) -> dict[str, Any] | None
fetch(
urls: StrOrURL,
return_type: Literal["json"],
*,
request_method: RequestMethod = "get",
request_kwargs: dict[str, Any] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[True] = True,
retries: int = ...,
) -> dict[str, Any]
fetch(
urls: StrOrURL,
return_type: Literal["binary"],
*,
request_method: RequestMethod = "get",
request_kwargs: dict[str, Any] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[False],
retries: int = ...,
) -> bytes | None
fetch(
urls: StrOrURL,
return_type: Literal["binary"],
*,
request_method: RequestMethod = "get",
request_kwargs: dict[str, Any] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[True] = True,
retries: int = ...,
) -> bytes
fetch(
urls: Iterable[StrOrURL],
return_type: Literal["text"],
*,
request_method: RequestMethod = "get",
request_kwargs: Iterable[dict[str, Any]] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[False],
retries: int = ...,
) -> list[str | None]
fetch(
urls: Iterable[StrOrURL],
return_type: Literal["text"],
*,
request_method: RequestMethod = "get",
request_kwargs: Iterable[dict[str, Any]] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[True] = True,
retries: int = ...,
) -> list[str]
fetch(
urls: Iterable[StrOrURL],
return_type: Literal["json"],
*,
request_method: RequestMethod = "get",
request_kwargs: Iterable[dict[str, Any]] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[False],
retries: int = ...,
) -> list[dict[str, Any] | None]
fetch(
urls: Iterable[StrOrURL],
return_type: Literal["json"],
*,
request_method: RequestMethod = "get",
request_kwargs: Iterable[dict[str, Any]] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[True] = True,
retries: int = ...,
) -> list[dict[str, Any]]
fetch(
urls: Iterable[StrOrURL],
return_type: Literal["binary"],
*,
request_method: RequestMethod = "get",
request_kwargs: Iterable[dict[str, Any]] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[False],
retries: int = ...,
) -> list[bytes | None]
fetch(
urls: Iterable[StrOrURL],
return_type: Literal["binary"],
*,
request_method: RequestMethod = "get",
request_kwargs: Iterable[dict[str, Any]] | None = None,
limit_per_host: int = ...,
timeout: float = ...,
ssl: bool | SSLContext = ...,
raise_status: Literal[True] = True,
retries: int = ...,
) -> list[bytes]
fetch(
urls,
return_type,
*,
request_method="get",
request_kwargs=None,
limit_per_host=MAX_HOSTS,
timeout=TIMEOUT,
ssl=True,
raise_status=True,
retries=MAX_RETRIES,
)
Fetch data from multiple URLs asynchronously.
Parameters:
-
urls(str, list of str) –URL(s) to fetch data from.
-
return_type(('text', 'json', 'binary'), default:"text") –Desired response format, which can be
text,json, orbinary. -
request_method(('get', 'post'), default:"get") –HTTP method to use, by default
get. -
request_kwargs(dict, list of dict, default:None) –Keyword argument(s) for (each) request, by default
None. If provided, must be the same length asurls. -
limit_per_host(int, default:MAX_HOSTS) –Maximum number of concurrent connections per host, by default 4
-
timeout(int, default:TIMEOUT) –Request timeout in seconds, by default 2 minutes.
-
ssl(bool or SSLContext, default:True) –SSL configuration for the requests, by default True.
-
raise_status(bool, default:True) –Raise an exception if a request fails, by default True. Otherwise, the exception is logged and the function continues. The queries that failed will return
None. -
retries(int, default:MAX_RETRIES) –Number of retry attempts for transient errors, by default 3.
Returns:
-
list of str, list of bytes, or list of dicts–The response data from the requests
Raises:
-
InputTypeError–If urls is not a str or iterable If request_kwargs is provided and its length doesn't match urls If request_kwargs is provided and is not a dict or list of dict
-
InputValueError–If request_method is not
getorpostIf return_type is nottext,json, orbinary -
ServiceError–If the request fails or response cannot be processed when
raise_status=True