Skip to content

Added functionality to the cml python package

License

Notifications You must be signed in to change notification settings

cloudera/cmlextensions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cmlextensions

This python library has added functionality for Cloudera Machine Learning (CML)'s cml (or legacy cdsw) library. The library is organized in modules and is built on the CML Workers API and other CML functionalities.

Installation

This library can be installed directly from GitHub:

%pip install git+https://github.com/cloudera/cmlextensions.git

Modules

Ray

Ray is a unified framework for scaling AI and Python applications. We can create a cluster on CML infrastructure to scale out Ray processes. This cmlextensions.ray_cluster module abstracts the ray cluster provisioning and operations so users can focus on their application code instead of infrastructure management.

Example usage:

> from cmlextensions.ray_cluster import RayCluster

> cluster = RayCluster(num_workers=2)
> cluster.init()

--------------------
Ray cluster started
--------------------

The Ray dashboard is running at
https://024d0wpuw0eain8r.ml-4c5feac0-3ec.go01-dem.ylcu-atmi.cloudera.site/

To connect to this Ray cluster from this CML Session,
use the following Python code:
  import ray
  ray.init(address='ray://100.100.127.74:10001')

Dask

Dask is a flexible parallel computing library for analytics in Python. We can create a cluster on CML infrastructure to scale out Dask processes. This cmlextensions.dask_cluster module abstracts the dask cluster provisioning and operations so users can focus on their application code instead of infrastructure management.

Example usage:

> from cmlextensions.dask_cluster import DaskCluster

> cluster = DaskCluster(num_workers=2)
> cluster.init()

--------------------
Dask cluster started
--------------------

The Dask dashboard is running at
https://024d0wpuw0eain8r.ml-4c5feac0-3ec.go01-dem.ylcu-atmi.cloudera.site/

To connect to this Dask cluster from this CML Session,
use the following Python code:
  from dask.distributed import Client
  client = Client('tcp://100.100.225.149:8786')

Workers_v2

The cml (or legacy cdsw) library has a workers module already. The v2 module is experimenting with a new management interface for the CML Workers infrastructure. The v2 module has more defaults and a more OOP approach for managing groups of workers. There is no added functionality, the v2 library relies on the functionality available in the orignal version.

Example usage:

> import cmlextensions.workers_v2 as workers
> from cmlextensions.workers_v2 import WorkerGroup

> wg1 = WorkerGroup(1, code="import time;time.sleep(300)")
> wg1.get_workers()
id	status	created_at	running_at	finished_at	duration	ip_address
221pa78rmzau93zf	running	2022-09-09T12:02:14.031Z	2022-09-09T12:02:27.945Z	None	1	100.100.209.35

> workers.get_workers(active=True)
id	status	created_at	running_at	finished_at	duration	ip_address
221pa78rmzau93zf	running	2022-09-09T12:02:14.031Z	2022-09-09T12:02:27.945Z	None	7	100.100.209.35
6tyvg0kuu0wrlcyl	running	2022-09-09T12:01:50.282Z	2022-09-09T12:02:04.387Z	None	30	100.100.127.80

> wg1.stop_workers()

About

Added functionality to the cml python package

Resources

License

Stars

Watchers

Forks

Languages