Developer Interface¶
This part of the documentation covers all the interfaces of the SDK. For parts where it depends on external libraries, we document the most important right here and provide links to the canonical documentation.
Client Interface¶
All of the SDK’ functionality can be accessed by these methods.
-
class
AzureADServicePrincipalClient
(databricks_instance: str, access_token: str, management_token: str = None, resource_id: str = None)[source]¶ Client that authentificates using AZURE_AD_SERVICE_PRINCIPAL method
-
test_connection
()¶ Tests to connection to databricks
-
-
class
AzureADUserClient
(databricks_instance: str, access_token: str, resource_id: str = None)[source]¶ Client that authentificates using AZURE_AD_USER method
-
test_connection
()¶ Tests to connection to databricks
-
-
class
BaseClient
(databricks_instance: str, composer: azure_databricks_sdk_python.client.Composer, config={})[source]¶ Base Class for API Clients
-
class
Client
[source]¶ Factory for Clients
-
static
use_azure_ad_service_principal
(databricks_instance: str, access_token: str, management_token: str = None, resource_id: str = None)[source]¶ Returns a azure_ad_service_principal client
- Args:
- databricks_instance (str): Databricks instance name (FQDN). access_token (str): Azure AD access token. management_token (str): Azure AD management token. Defaults to None. resource_id (str, optional): Databricks workspace resource ID. Defaults to None. Required only for admin sp. For non-admin, Service principal must be added to the workspace prior to login.
- Returns:
- AzureADServicePrincipalClient: azure_ad_service_principal client.
-
static
use_azure_ad_user
(databricks_instance: str, access_token: str, resource_id: str = None)[source]¶ Returns a azure_ad_user client
- Args:
- databricks_instance (str): Databricks instance name (FQDN). access_token (str): Azure AD access token. resource_id (str, optional): Databricks workspace resource ID. Defaults to None. Required for non-admin users who want to log in as an admin user.
- Returns:
- AzureADUserClient: azure_ad_user client.
-
static
use_personal_access_token
(databricks_instance: str, personal_access_token: str)[source]¶ Returns a personal_access_token client
- Args:
- databricks_instance (str): Databricks instance name (FQDN). personal_access_token (str): Databricks personal access token.
- Returns:
- PersonalAccessTokenClient: personal_access_token client.
-
static
-
class
PersonalAccessTokenClient
(databricks_instance: str, personal_access_token: str)[source]¶ Client that authentificates using PERSONAL_ACCESS_TOKEN method
-
test_connection
()¶ Tests to connection to databricks
-
-
class
AuthMethods
[source]¶ Enum representing authentification method
For now there are three support auth method for the API: - PERSONAL_ACCESS_TOKEN: Databricks personal access tokens [1]. - AZURE_AD_USER: Azure Active Directory access token [2]. - AZURE_AD_SERVICE_PRINCIPAL: Active Directory token using a service principal [3].
[1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/authentication [2]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/app-aad-token [3]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-prin-aad-token
Tokens Interface¶
-
class
Tokens
(**kwargs)[source]¶ The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Azure Databricks REST APIs.
-
create
(comment: str = None, lifetime_seconds: int = 7776000)[source]¶ Create and return a token.
- Args:
- comment (str, optional): Optional description to attach to the token. Defaults to None. lifetime_seconds (int, optional): The lifetime of the token, in seconds. If no lifetime is specified, the token remains valid indefinitely. Defaults to 7776000 (90j).
- Returns:
- dict: contains token_value and token_info as a PublicTokenInfo.
-
-
class
PublicTokenInfo
(token_id: str, creation_time: int, expiry_time: int, comment: str)[source]¶ Public token info: A data structure that describes the public metadata of an access token as defined in [1]. [1]: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/tokens#–public-token-info
Clusters Interface¶
-
class
Clusters
(**kwargs)[source]¶ The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.
-
create
(req: azure_databricks_sdk_python.types.clusters.ClusterAttributes, force: bool = False)[source]¶ Create a new Apache Spark cluster. This method acquires new instances from the cloud provider if necessary.
- Args:
- req (ClusterAttributes): Common set of attributes set during cluster creation. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
delete
(cluster_id)[source]¶ Terminate a cluster given its ID.
- Args:
- cluster_id (str): The cluster to be terminated. This field is required.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
edit
(req: azure_databricks_sdk_python.types.clusters.ClusterAttributes, force: bool = False)[source]¶ Edit the configuration of a cluster to match the provided attributes and size.
- Args
- req (ClusterAttributes): Common set of attributes set during cluster creation. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
events
(req: azure_databricks_sdk_python.types.clusters.ClusterEventRequest, force: bool = False)[source]¶ Retrieve a list of events about the activity of a cluster.
- Args:
- req (ClusterEventRequest): Cluster event request structure. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
- Returns:
- ClusterEventResponse: Cluster event request response structure.
-
get
(cluster_id)[source]¶ Retrieve the information for a cluster given its identifier. Clusters can be described while they are running or up to 30 days after they are terminated.
- Args:
- cluster_id (str):The cluster about which to retrieve information. This field is required.
- Returns:
- ClusterInfo: Metadata about a cluster.
-
list
()[source]¶ Return information about all pinned clusters, active clusters, up to 70 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days.
- Returns:
- [ClusterInfo]: A list of clusters.
-
list_node_types
()[source]¶ Return a list of supported Spark node types. These node types can be used to launch a cluster.
- Returns:
- [NodeType]: The list of available Spark node types.
-
permanent_delete
(cluster_id)[source]¶ Permanently delete a cluster.
- Args:
- cluster_id (str): The cluster to be permanently deleted. This field is required.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
pin
(cluster_id)[source]¶ Ensure that an all-purpose cluster configuration is retained even after a cluster has been terminated for more than 30 days. Pinning ensures that the cluster is always returned by the List API. Pinning a cluster that is already pinned has no effect.
- Args:
- cluster_id (str):The cluster to pin. This field is required.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
resize
(req: azure_databricks_sdk_python.types.clusters.ClusterResizeRequest, force: bool = False)[source]¶ Resize a cluster to have a desired number of workers. The cluster must be in the RUNNING state.
- Args:
- req (ClusterResizeRequest): Cluster resize request structure. This field is required. force (bool): If false, it will check that req is a dict then pass it as is, with no type validation.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
restart
(cluster_id)[source]¶ - Restart a cluster given its ID.
- The cluster must be in the RUNNING state.
- Args:
- cluster_id (str): The cluster to be started. This field is required.
- Returns:
- ClusterId: in case of success or will raise an exception.
-
spark_versions
()[source]¶ Return the list of available runtime versions. These versions can be used to launch a cluster.
- Returns:
- [SparkVersion]: All the available runtime versions.
-
-
class
AutoScale
(min_workers: int, max_workers: int)[source]¶ AutoScale: Range defining the min and max number of cluster workers [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#autoscale
-
class
ClusterAttributes
(spark_version: str, node_type_id: str, num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None, autotermination_minutes: str = None, driver_node_type_id: str = None, cluster_id: str = None, cluster_name: str = None, cluster_source: azure_databricks_sdk_python.types.clusters.ClusterSource = None, enable_elastic_disk: bool = None, ssh_public_keys: List[str] = None, spark_conf: Dict[KT, VT] = None, custom_tags: Dict[KT, VT] = None, cluster_log_conf: azure_databricks_sdk_python.types.clusters.ClusterLogConf = None, init_scripts: List[azure_databricks_sdk_python.types.clusters.InitScriptInfo] = None, docker_image: azure_databricks_sdk_python.types.clusters.DockerImage = None, spark_env_vars: Dict[KT, VT] = None, instance_pool_id: str = None, policy_id: str = None, idempotency_token: str = None)[source]¶ ClusterAttributes: Common set of attributes set during cluster creation. These attributes cannot be changed over the lifetime of a cluster. [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterattributes
-
class
ClusterCloudProviderNodeInfo
(available_core_quota: int = None, total_core_quota: int = None, status: List[azure_databricks_sdk_python.types.clusters.ClusterCloudProviderNodeStatus] = None)[source]¶ ClusterCloudProviderNodeInfo: Information about an instance supplied by a cloud provider [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustercloudprovidernodeinfo
-
class
ClusterCloudProviderNodeStatus
[source]¶ ClusterCloudProviderNodeStatus: Status of an instance supplied by a cloud provider [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustercloudprovidernodestatus
-
class
ClusterEvent
(cluster_id: str, timestamp: int = None, type: azure_databricks_sdk_python.types.clusters.ClusterEventType = None, details: azure_databricks_sdk_python.types.clusters.EventDetails = None)[source]¶ ClusterEvent: Cluster event information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterevent
-
class
ClusterEventRequest
(cluster_id: str, start_time: int = None, end_time: int = None, order: azure_databricks_sdk_python.types.clusters.ListOrder = None, event_types: List[azure_databricks_sdk_python.types.clusters.ClusterEventType] = None, offset: int = None, limit: int = None)[source]¶ ClusterEventRequest: Cluster event request structure [1] [1]: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/clusters#–request-structure-10
-
class
ClusterEventResponse
(events: List[azure_databricks_sdk_python.types.clusters.ClusterEvent] = None, total_count: int = None, next_page: azure_databricks_sdk_python.types.clusters.ClusterEventRequest = None)[source]¶ ClusterEventRequest: Cluster event request response structure [1] [1]: https://docs.microsoft.com/en-gb/azure/databricks/dev-tools/api/latest/clusters#–response-structure-5
-
class
ClusterEventType
[source]¶ ClusterEventType: Type of a cluster event [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustereventtype
-
class
ClusterId
(cluster_id: str)[source]¶ ClusterId: represents a cluster id. Not official in the API data structures.
-
class
ClusterInfo
(creator_user_name: str, cluster_name: str, spark_version: str, node_type_id: str, driver_node_type_id: str, autotermination_minutes: int, enable_elastic_disk: bool, state: azure_databricks_sdk_python.types.clusters.ClusterState, state_message: str, start_time: int, last_state_loss_time: int, default_tags: Dict[KT, VT], cluster_id: str = None, spark_context_id: int = None, jdbc_port: int = None, cluster_memory_mb: int = None, cluster_cores: float = None, cluster_log_status: azure_databricks_sdk_python.types.clusters.LogSyncStatus = None, termination_reason: azure_databricks_sdk_python.types.clusters.TerminationReason = None, terminated_time: int = None, last_activity_time: int = None, instance_pool_id: str = None, spark_env_vars: Dict[KT, VT] = None, docker_image: azure_databricks_sdk_python.types.clusters.DockerImage = None, init_scripts: List[azure_databricks_sdk_python.types.clusters.InitScriptInfo] = None, cluster_log_conf: azure_databricks_sdk_python.types.clusters.ClusterLogConf = None, spark_conf: Dict[KT, VT] = None, driver: azure_databricks_sdk_python.types.clusters.SparkNode = None, executors: List[azure_databricks_sdk_python.types.clusters.SparkNode] = None, num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None)[source]¶ ClusterInfo: Metadata about a cluster [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterinfo
-
class
ClusterLogConf
(dbfs: azure_databricks_sdk_python.types.clusters.DbfsStorageInfo)[source]¶ ClusterLogConf: Path to cluster log. [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterlogconf
-
class
ClusterResizeRequest
(cluster_id: str, num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None)[source]¶ ClusterResizeRequest: represents a resize request. Not official in the API data structures.
-
class
ClusterSize
(num_workers: int = None, autoscale: azure_databricks_sdk_python.types.clusters.AutoScale = None)[source]¶ ClusterSize: Cluster size specification [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustersize
-
class
ClusterSource
[source]¶ ClusterSource: Status code indicating why the cluster was terminated due to a pool failure [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clustersource
-
class
ClusterState
[source]¶ ClusterState: State of a cluster[1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#clusterstate
-
class
DbfsStorageInfo
(destination: str)[source]¶ DbfsStorageInfo: DBFS storage information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#dbfsstorageinfo
-
class
DockerBasicAuth
(username: str, password: str)[source]¶ DockerBasicAuth: Docker image connection information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#dockerbasicauth
-
class
DockerImage
(url: str, basic_auth: azure_databricks_sdk_python.types.clusters.DockerBasicAuth)[source]¶ DockerImage: Docker image connection information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#dockerimage
-
class
EventDetails
(user: str = None, reason: azure_databricks_sdk_python.types.clusters.TerminationReason = None, current_num_workers: int = None, target_num_workers: int = None, previous_attributes: azure_databricks_sdk_python.types.clusters.ClusterAttributes = None, attributes: azure_databricks_sdk_python.types.clusters.ClusterAttributes = None, previous_cluster_size: azure_databricks_sdk_python.types.clusters.ClusterSize = None, cluster_size: azure_databricks_sdk_python.types.clusters.ClusterSize = None, cause: azure_databricks_sdk_python.types.clusters.ResizeCause = None)[source]¶ EventDetails: Cluster event information [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#eventdetails
-
class
InitScriptInfo
(dbfs: azure_databricks_sdk_python.types.clusters.DbfsStorageInfo)[source]¶ InitScriptInfo: Path to an init script [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#initscriptinfo
-
class
ListOrder
[source]¶ ListOrder: Generic ordering enum for list-based queries [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#listorder
-
class
LogSyncStatus
(last_attempted: int, last_exception: str)[source]¶ LogSyncStatus: Log delivery status [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#logsyncstatus
-
class
NodeType
(node_type_id: str, memory_mb: int, num_cores: float, description: str, instance_type_id: str, is_deprecated: bool, node_info: azure_databricks_sdk_python.types.clusters.ClusterCloudProviderNodeInfo)[source]¶ NodeType: Description of a Spark node type including both the dimensions of the node and the instance type on which it will be hosted [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#nodetype
-
class
PoolClusterTerminationCode
[source]¶ PoolClusterTerminationCode: Status code indicating why the cluster was terminated due to a pool failure [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#poolclusterterminationcode
-
class
ResizeCause
[source]¶ ResizeCause: Reason why a cluster was resized [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#resizecause
-
class
SparkNode
(private_ip: str, public_dns: str, node_id: str, instance_id: str, start_timestamp: int, host_private_ip: str)[source]¶ SparkNode: Spark driver or executor configuration [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#sparknode
-
class
SparkVersion
(key: str, name: str)[source]¶ SparkVersion: Databricks Runtime version of the cluster. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#sparkversion
-
class
TerminationCode
[source]¶ TerminationCode: Status code indicating why the cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#terminationcode
-
class
TerminationParameter
(username: str = None, azure_error_message: str = None, inactivity_duration_min: int = None, instance_id: str = None, azure_error_code: str = None, instance_pool_id: str = None, instance_pool_error_code: str = None, databricks_error_message: str = None)[source]¶ TerminationParameter: Key that provides additional information about why a cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#terminationparameter
-
class
TerminationReason
(code: azure_databricks_sdk_python.types.clusters.TerminationCode, type: azure_databricks_sdk_python.types.clusters.TerminationType, parameters: azure_databricks_sdk_python.types.clusters.TerminationParameter)[source]¶ TerminationReason: Reason why a cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#TerminationReason
-
class
TerminationType
[source]¶ TerminationType: Reason why the cluster was terminated [1]. [1]: https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#terminationtype
Lower-Level Classes¶
-
class
APIWithAuth
[source]¶ Base class API composers the API composers implement auth specific logic as they inherit from this class that implements common functionality such as http get and post and also error handeling.
-
class
APIWithAzureADServicePrincipal
(base_url: str, access_token: str, management_token: str, resource_id: str)[source]¶ API composers for AzureADServicePrincipal auth