Developer Interface

This part of the documentation covers all the interfaces of the SDK. For parts where it depends on external libraries, we document the most important right here and provide links to the canonical documentation.

Client Interface

All of the SDK’ functionality can be accessed by these methods.

class AzureADServicePrincipalClient(databricks_instance: str, access_token: str, management_token: str = None, resource_id: str = None)[source]

Client that authentificates using AZURE_AD_SERVICE_PRINCIPAL method

test_connection()

Tests to connection to databricks

class AzureADUserClient(databricks_instance: str, access_token: str, resource_id: str = None)[source]

Client that authentificates using AZURE_AD_USER method

test_connection()

Tests to connection to databricks

class BaseClient(databricks_instance: str, composer: azure_databricks_sdk_python.client.Composer, config={})[source]

Base Class for API Clients

test_connection()[source]

Tests to connection to databricks

class Client[source]

Factory for Clients

static use_azure_ad_service_principal(databricks_instance: str, access_token: str, management_token: str = None, resource_id: str = None)[source]

Returns a azure_ad_service_principal client

Args:
databricks_instance (str): Databricks instance name (FQDN). access_token (str): Azure AD access token. management_token (str): Azure AD management token. Defaults to None. resource_id (str, optional): Databricks workspace resource ID. Defaults to None. Required only for admin sp. For non-admin, Service principal must be added to the workspace prior to login.
Returns:
AzureADServicePrincipalClient: azure_ad_service_principal client.
static use_azure_ad_user(databricks_instance: str, access_token: str, resource_id: str = None)[source]

Returns a azure_ad_user client

Args:
databricks_instance (str): Databricks instance name (FQDN). access_token (str): Azure AD access token. resource_id (str, optional): Databricks workspace resource ID. Defaults to None. Required for non-admin users who want to log in as an admin user.
Returns:
AzureADUserClient: azure_ad_user client.
static use_personal_access_token(databricks_instance: str, personal_access_token: str)[source]

Returns a personal_access_token client

Args:
databricks_instance (str): Databricks instance name (FQDN). personal_access_token (str): Databricks personal access token.
Returns:
PersonalAccessTokenClient: personal_access_token client.
class Composer[source]

Composer that aggregates API wrappers

compose(args)[source]

composes self with API wrappers.

Args:
args (dict): configuration dict.
Returns:
Composer: return new composed object.
class PersonalAccessTokenClient(databricks_instance: str, personal_access_token: str)[source]

Client that authentificates using PERSONAL_ACCESS_TOKEN method

test_connection()

Tests to connection to databricks

class AuthMethods[source]

Enum representing authentification method

For now there are three support auth method for the API: - PERSONAL_ACCESS_TOKEN: Databricks personal access tokens [1]. - AZURE_AD_USER: Azure Active Directory access token [2]. - AZURE_AD_SERVICE_PRINCIPAL: Active Directory token using a service principal [3].

[1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/authentication [2]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/app-aad-token [3]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-prin-aad-token

Tokens Interface

class Tokens(**kwargs)[source]

The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Azure Databricks REST APIs.

create(comment: str = None, lifetime_seconds: int = 7776000)[source]

Create and return a token.

Args:
comment (str, optional): Optional description to attach to the token. Defaults to None. lifetime_seconds (int, optional): The lifetime of the token, in seconds. If no lifetime is specified, the token remains valid indefinitely. Defaults to 7776000 (90j).
Returns:
dict: contains token_value and token_info as a PublicTokenInfo.
delete(token_id: str)[source]

Revoke an access token.

Args:
token_id (str): The ID of the token to be revoked.
Returns:
TokenId: in case of success or will raise an exception.
list()[source]

List all the valid tokens for a user-workspace pair.

Returns:
[PublicTokenInfo]: A list of token information for a user-workspace pair.
class PublicTokenInfo(token_id: str, creation_time: int, expiry_time: int, comment: str)[source]

Public token info: A data structure that describes the public metadata of an access token as defined in [1]. [1]: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/tokens#–public-token-info

class Token(token_value: str, token_info: azure_databricks_sdk_python.types.tokens.PublicTokenInfo)[source]

Token: represents a token. Not official in the API data structures.

class TokenId(token_id: str)[source]

TokenId: represents a token id. Not official in the API data structures.

Clusters Interface

class Clusters(**kwargs)[source]

The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.

create(req: azure_databricks_sdk_python.types.clusters.ClusterAttributes, force: bool = False)[source]

Create a new Apache Spark cluster. This method acquires new instances from the cloud provider if necessary.

Args:
req (ClusterAttributes): Common set of attributes set during cluster creation. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
Returns:
ClusterId: in case of success or will raise an exception.
delete(cluster_id)[source]

Terminate a cluster given its ID.

Args:
cluster_id (str): The cluster to be terminated. This field is required.
Returns:
ClusterId: in case of success or will raise an exception.
edit(req: azure_databricks_sdk_python.types.clusters.ClusterAttributes, force: bool = False)[source]

Edit the configuration of a cluster to match the provided attributes and size.

Args
req (ClusterAttributes): Common set of attributes set during cluster creation. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
Returns:
ClusterId: in case of success or will raise an exception.
events(req: azure_databricks_sdk_python.types.clusters.ClusterEventRequest, force: bool = False)[source]

Retrieve a list of events about the activity of a cluster.

Args:
req (ClusterEventRequest): Cluster event request structure. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
Returns:
ClusterEventResponse: Cluster event request response structure.
get(cluster_id)[source]

Retrieve the information for a cluster given its identifier. Clusters can be described while they are running or up to 30 days after they are terminated.

Args:
cluster_id (str):The cluster about which to retrieve information. This field is required.
Returns:
ClusterInfo: Metadata about a cluster.
list()[source]

Return information about all pinned clusters, active clusters, up to 70 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days.

Returns:
[ClusterInfo]: A list of clusters.
list_node_types()[source]

Return a list of supported Spark node types. These node types can be used to launch a cluster.

Returns:
[NodeType]: The list of available Spark node types.
permanent_delete(cluster_id)[source]

Permanently delete a cluster.

Args:
cluster_id (str): The cluster to be permanently deleted. This field is required.
Returns:
ClusterId: in case of success or will raise an exception.
pin(cluster_id)[source]

Ensure that an all-purpose cluster configuration is retained even after a cluster has been terminated for more than 30 days. Pinning ensures that the cluster is always returned by the List API. Pinning a cluster that is already pinned has no effect.

Args:
cluster_id (str):The cluster to pin. This field is required.
Returns:
ClusterId: in case of success or will raise an exception.
resize(req: azure_databricks_sdk_python.types.clusters.ClusterResizeRequest, force: bool = False)[source]

Resize a cluster to have a desired number of workers. The cluster must be in the RUNNING state.

Args:
req (ClusterResizeRequest): Cluster resize request structure. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
Returns:
ClusterId: in case of success or will raise an exception.
restart(cluster_id)[source]
Restart a cluster given its ID.
The cluster must be in the RUNNING state.
Args:
cluster_id (str): The cluster to be started. This field is required.
Returns:
ClusterId: in case of success or will raise an exception.
spark_versions()[source]

Return the list of available runtime versions. These versions can be used to launch a cluster.

Returns:
[SparkVersion]: All the available runtime versions.
start(cluster_id)[source]

Start a terminated cluster given its ID.

Args:
cluster_id (str): The cluster to be started. This field is required.
Returns:
ClusterId: in case of success or will raise an exception.
unpin(cluster_id)[source]

Allows the cluster to eventually be removed from the list returned by the List API. Unpinning a cluster that is not pinned has no effect.

Args:
cluster_id (str):The cluster to pin. This field is required.
Returns:
ClusterId: in case of success or will raise an exception.
class AutoScale(min_workers: int, max_workers: int)[source]

AutoScale: Range defining the min and max number of cluster workers [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#autoscale

class ClusterAttributes(spark_version: str, node_type_id: str, num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None, autotermination_minutes: str = None, driver_node_type_id: str = None, cluster_id: str = None, cluster_name: str = None, cluster_source: azure_databricks_sdk_python.types.clusters.ClusterSource = None, enable_elastic_disk: bool = None, ssh_public_keys: List[str] = None, spark_conf: Dict[KT, VT] = None, custom_tags: Dict[KT, VT] = None, cluster_log_conf: azure_databricks_sdk_python.types.clusters.ClusterLogConf = None, init_scripts: List[azure_databricks_sdk_python.types.clusters.InitScriptInfo] = None, docker_image: azure_databricks_sdk_python.types.clusters.DockerImage = None, spark_env_vars: Dict[KT, VT] = None, instance_pool_id: str = None, policy_id: str = None, idempotency_token: str = None)[source]

ClusterAttributes: Common set of attributes set during cluster creation. These attributes cannot be changed over the lifetime of a cluster. [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterattributes

class ClusterCloudProviderNodeInfo(available_core_quota: int = None, total_core_quota: int = None, status: List[azure_databricks_sdk_python.types.clusters.ClusterCloudProviderNodeStatus] = None)[source]

ClusterCloudProviderNodeInfo: Information about an instance supplied by a cloud provider [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustercloudprovidernodeinfo

class ClusterCloudProviderNodeStatus[source]

ClusterCloudProviderNodeStatus: Status of an instance supplied by a cloud provider [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustercloudprovidernodestatus

class ClusterEvent(cluster_id: str, timestamp: int = None, type: azure_databricks_sdk_python.types.clusters.ClusterEventType = None, details: azure_databricks_sdk_python.types.clusters.EventDetails = None)[source]

ClusterEvent: Cluster event information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterevent

class ClusterEventRequest(cluster_id: str, start_time: int = None, end_time: int = None, order: azure_databricks_sdk_python.types.clusters.ListOrder = None, event_types: List[azure_databricks_sdk_python.types.clusters.ClusterEventType] = None, offset: int = None, limit: int = None)[source]

ClusterEventRequest: Cluster event request structure [1] [1]: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/clusters#–request-structure-10

class ClusterEventResponse(events: List[azure_databricks_sdk_python.types.clusters.ClusterEvent] = None, total_count: int = None, next_page: azure_databricks_sdk_python.types.clusters.ClusterEventRequest = None)[source]

ClusterEventRequest: Cluster event request response structure [1] [1]: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/clusters#–response-structure-5

class ClusterEventType[source]

ClusterEventType: Type of a cluster event [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustereventtype

class ClusterId(cluster_id: str)[source]

ClusterId: represents a cluster id. Not official in the API data structures.

class ClusterInfo(creator_user_name: str, cluster_name: str, spark_version: str, node_type_id: str, driver_node_type_id: str, autotermination_minutes: int, enable_elastic_disk: bool, state: azure_databricks_sdk_python.types.clusters.ClusterState, state_message: str, start_time: int, last_state_loss_time: int, default_tags: Dict[KT, VT], cluster_id: str = None, spark_context_id: int = None, jdbc_port: int = None, cluster_memory_mb: int = None, cluster_cores: float = None, cluster_log_status: azure_databricks_sdk_python.types.clusters.LogSyncStatus = None, termination_reason: azure_databricks_sdk_python.types.clusters.TerminationReason = None, terminated_time: int = None, last_activity_time: int = None, instance_pool_id: str = None, spark_env_vars: Dict[KT, VT] = None, docker_image: azure_databricks_sdk_python.types.clusters.DockerImage = None, init_scripts: List[azure_databricks_sdk_python.types.clusters.InitScriptInfo] = None, cluster_log_conf: azure_databricks_sdk_python.types.clusters.ClusterLogConf = None, spark_conf: Dict[KT, VT] = None, driver: azure_databricks_sdk_python.types.clusters.SparkNode = None, executors: List[azure_databricks_sdk_python.types.clusters.SparkNode] = None, num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None)[source]

ClusterInfo: Metadata about a cluster [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterinfo

class ClusterLogConf(dbfs: azure_databricks_sdk_python.types.clusters.DbfsStorageInfo)[source]

ClusterLogConf: Path to cluster log. [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterlogconf

class ClusterResizeRequest(cluster_id: str, num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None)[source]

ClusterResizeRequest: represents a resize request. Not official in the API data structures.

class ClusterSize(num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None)[source]

ClusterSize: Cluster size specification [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustersize

class ClusterSource[source]

ClusterSource: Status code indicating why the cluster was terminated due to a pool failure [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustersource

class ClusterState[source]

ClusterState: State of a cluster[1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterstate

class DbfsStorageInfo(destination: str)[source]

DbfsStorageInfo: DBFS storage information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#dbfsstorageinfo

class DockerBasicAuth(username: str, password: str)[source]

DockerBasicAuth: Docker image connection information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#dockerbasicauth

class DockerImage(url: str, basic_auth: azure_databricks_sdk_python.types.clusters.DockerBasicAuth)[source]

DockerImage: Docker image connection information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#dockerimage

class EventDetails(user: str = None, reason: azure_databricks_sdk_python.types.clusters.TerminationReason = None, current_num_workers: int = None, target_num_workers: int = None, previous_attributes: azure_databricks_sdk_python.types.clusters.ClusterAttributes = None, attributes: azure_databricks_sdk_python.types.clusters.ClusterAttributes = None, previous_cluster_size: azure_databricks_sdk_python.types.clusters.ClusterSize = None, cluster_size: azure_databricks_sdk_python.types.clusters.ClusterSize = None, cause: azure_databricks_sdk_python.types.clusters.ResizeCause = None)[source]

EventDetails: Cluster event information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#eventdetails

class InitScriptInfo(dbfs: azure_databricks_sdk_python.types.clusters.DbfsStorageInfo)[source]

InitScriptInfo: Path to an init script [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#initscriptinfo

class ListOrder[source]

ListOrder: Generic ordering enum for list-based queries [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#listorder

class LogSyncStatus(last_attempted: int, last_exception: str)[source]

LogSyncStatus: Log delivery status [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#logsyncstatus

class NodeType(node_type_id: str, memory_mb: int, num_cores: float, description: str, instance_type_id: str, is_deprecated: bool, node_info: azure_databricks_sdk_python.types.clusters.ClusterCloudProviderNodeInfo)[source]

NodeType: Description of a Spark node type including both the dimensions of the node and the instance type on which it will be hosted [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#nodetype

class PoolClusterTerminationCode[source]

PoolClusterTerminationCode: Status code indicating why the cluster was terminated due to a pool failure [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#poolclusterterminationcode

class ResizeCause[source]

ResizeCause: Reason why a cluster was resized [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#resizecause

class SparkNode(private_ip: str, public_dns: str, node_id: str, instance_id: str, start_timestamp: int, host_private_ip: str)[source]

SparkNode: Spark driver or executor configuration [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#sparknode

class SparkVersion(key: str, name: str)[source]

SparkVersion: Databricks Runtime version of the cluster. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#sparkversion

class TerminationCode[source]

TerminationCode: Status code indicating why the cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#terminationcode

class TerminationParameter(username: str = None, azure_error_message: str = None, inactivity_duration_min: int = None, instance_id: str = None, azure_error_code: str = None, instance_pool_id: str = None, instance_pool_error_code: str = None, databricks_error_message: str = None)[source]

TerminationParameter: Key that provides additional information about why a cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#terminationparameter

class TerminationReason(code: azure_databricks_sdk_python.types.clusters.TerminationCode, type: azure_databricks_sdk_python.types.clusters.TerminationType, parameters: azure_databricks_sdk_python.types.clusters.TerminationParameter)[source]

TerminationReason: Reason why a cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#TerminationReason

class TerminationType[source]

TerminationType: Reason why the cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#terminationtype

Lower-Level Classes

class API(**kwargs)[source]

Base class for API wrappers. It composes with APIWithAuth API classes.

class APIWithAuth[source]

Base class API composers the API composers implement auth specific logic as they inherit from this class that implements common functionality such as http get and post and also error handeling.

class APIWithAzureADServicePrincipal(base_url: str, access_token: str, management_token: str, resource_id: str)[source]

API composers for AzureADServicePrincipal auth

class APIWithAzureADUser(base_url: str, access_token: str, resource_id: str)[source]

API composers for AzureADUser auth

class APIWithPersonalAccessToken(base_url: str, personal_access_token: str)[source]

API composers for PersonalAccessToken auth