- Upgraded Mesos to 1.0.0. Note: as part of this upgrade we have switched from depending on the mesos.native egg for Thermos in favor of the stripped down mesos.executor egg. This means users launching Docker tasks with the Mesos DockerContainerizer are no longer required to use images that include all of Mesos's dependencies.
- Scheduler command line behavior has been modified to warn users of the deprecation of
production
attribute inJob
thrift struct. The scheduler is queried for tier configurations and the user's choice oftier
andproduction
attributes is revised, if necessary. Iftier
is already set, theproduction
attribute might be adjusted to match thetier
selection. Otherwise,tier
is selected based on the value ofproduction
attribute. If a matching tier is not found, thedefault
tier from tier configuration file (tiers.json
) is used. - The
/offers
endpoint has been modified to display attributes of resource offers as received from Mesos. This has affected rendering of some of the existing attributes. Furthermore, it now dumps additional offer attributes including reservations and persistent volumes. - The scheduler API now accepts both thrift JSON and binary thrift. If a request is sent without a
Content-Type
header, or aContent-Type
header ofapplication/x-thrift
orapplication/json
orapplication/vnd.apache.thrift.json
the request is treated as thrift JSON. If a request is sent with aContent-Type
header ofapplication/vnd.apache.thrift.binary
the request is treated as binary thrift. If theAccept
header of the request isapplication/vnd.apache.thrift.binary
then the response will be binary thrift. Any other value forAccept
will result in thrift JSON. - Scheduler is now able to launch jobs using more than one executor at a time. To use this feature
the
-custom_executor_config
flag must point to a JSON file which contains at least one valid executor configuration as detailed in the configuration documentation. - Add rollback API to the scheduler and new client command to support rolling back active update jobs to their initial state.
- The scheduler flag
-zk_use_curator
now defaults totrue
and care should be taken when upgrading from a configuration that does not pass the flag. The scheduler upgrade should be performed by bringing all schedulers down, and then bringing upgraded schedulers up. A rolling upgrade would result in no leading scheduler for the duration of the roll which could be confusing to monitor and debug.
- The job configuration flag
production
is now deprecated. To achieve the same scheduling behavior thatproduction=true
used to provide, users should elect atier
for the job with attributespreemptible=false
andrevocable=false
. For example, thepreferred
tier in the default tier configuration file (tiers.json
) matches the above criteria. - The
ExecutorInfo.source
field is deprecated and has been replaced with a label namedsource
. It will be removed from Mesos in a future release. - The scheduler flag
-zk_use_curator
has been deprecated. If you have never set the flag and are upgrading you should take care as described in the note above.
- New scheduler commandline argument -enable_mesos_fetcher to allow job submissions to contain URIs which will be passed to the Mesos Fetcher and subsequently downloaded into the sandbox. Please note that enabling job submissions to download resources from arbitrary URIs may have security implications.
- Upgraded Mesos to 0.28.2.
-
Upgraded Mesos to 0.27.2
-
Added a new optional Apache Curator backend for performing scheduler leader election. You can enable this with the new
-zk_use_curator
scheduler argument. -
Adding --nosetuid-health-checks flag to control whether the executor runs health checks as the job's role's user.
-
New scheduler command line argument
-offer_filter_duration
to control the time after which we expect Mesos to re-offer unused resources. A short duration improves scheduling performance in smaller clusters, but might lead to resource starvation for other frameworks if you run multiple ones in your cluster. Uses the Mesos default of 5s. -
New scheduler command line option
-framework_name
to change the name used for registering the Aurora framework with Mesos. The current default value is 'TwitterScheduler'. -
Added experimental support for launching tasks using filesystem images and the Mesos unified containerizer. See that linked documentation for details on configuring Mesos to use the unified containerizer. Note that earlier versions of Mesos do not fully support the unified containerizer. Mesos 0.28.x or later is recommended for anyone adopting task images via the Mesos containerizer.
-
Upgraded to pystachio 0.8.1 to pick up support for the new Choice type.
-
The
container
property of aJob
is now a Choice of either aContainer
holder, or a direct reference to either aDocker
orMesos
container. -
New scheduler command line argument
-ip
to control what ip address to bind the schedulers http server to. -
Added experimental support for Mesos GPU resource. This feature will be available in Mesos 1.0 and is disabled by default. Use
-allow_gpu_resource
flag to enable it.IMPORTANT: once this feature is enabled, creating jobs with GPU resource will make scheduler snapshot backwards incompatible. Scheduler will be unable to read snapshot if rolled back to previous version. If rollback is absolutely necessary, perform the following steps:
- Set
-allow_gpu_resource
to false - Delete all jobs with GPU resource (including cron job schedules if applicable)
- Wait until GPU task history is pruned. You may speed it up by changing the history retention
flags, e.g.:
-history_prune_threshold=1mins
and-history_max_per_job_threshold=0
- In case there were GPU job updates created, prune job update history for affected jobs from
/h2console
endpoint or reduce job update pruning thresholds, e.g.:-job_update_history_pruning_threshold=1mins
and-job_update_history_per_job_threshold=0
- Ensure a new snapshot is created by running
aurora_admin scheduler_snapshot <cluster>
- Rollback to previous version
- Set
-
Experimental support for a webhook feature which POSTs all task state changes to a user defined endpoint.
-
Added support for specifying the default tier name in tier configuration file (
tiers.json
). Thedefault
property is required and is initialized with thepreemptible
tier (preemptible
tier tasks can be preempted but their resources cannot be revoked).
- Deprecated
--restart-threshold
option in theaurora job restart
command to match the job updater behavior. This option has no effect now and will be removed in the future release. - Deprecated
-framework_name
default argument 'TwitterScheduler'. In a future release this will change to 'aurora'. Please be aware that depending on your usage of Mesos, this will be a backward incompatible change. For details, see MESOS-703. - The
-thermos_observer_root
command line arg has been removed from the scheduler. This was a relic from the time when executor checkpoints were written globally, rather than into a task's sandbox. - Setting the
container
property of aJob
to aContainer
holder is deprecated in favor of setting it directly to the appropriate (i.e.Docker
orMesos
) container type. - Deprecated
numCpus
,ramMb
anddiskMb
fields inTaskConfig
andResourceAggregate
thrift structs. Useset<Resource> resources
to specify task resources or quota values. - The endpoint
/slaves
is deprecated. Please use/agents
instead. - Deprecated
production
field inTaskConfig
thrift struct. Usetier
field to specify task scheduling and resource handling behavior. - The scheduler
resources_*_ram_gb
andresources_*_disk_gb
metrics have been renamed toresources_*_ram_mb
andresources_*_disk_mb
respectively. Note the unit change: GB -> MB.
- Upgraded Mesos to 0.26.0
- Added a new health endpoint (/leaderhealth) which can be used for load balancer health checks to always forward requests to the leading scheduler.
- Added a new
aurora job add
client command to scale out an existing job. - Upgraded the scheduler ZooKeeper client from 3.4.6 to 3.4.8.
- Added support for dedicated constraints not exclusive to a particular role. See here for more details.
- Added a new argument
--announcer-hostname
to thermos executor to override hostname in service registry endpoint. See here for details. - Descheduling a cron job that was not actually scheduled will no longer return an error.
- Added a new argument
-thermos_home_in_sandbox
to the scheduler for optionally changing HOME to the sandbox during thermos executor/runner execution. This is useful in cases where the root filesystem inside of the container is read-only, as it moves PEX extraction into the sandbox. See here for more detail. - Support for ZooKeeper authentication in the executor announcer. See here for details.
- Scheduler H2 in-memory database is now using MVStore In addition, scheduler thrift snapshots are now supporting full DB dumps for faster restarts.
- Added scheduler argument
-require_docker_use_executor
that indicates whether the scheduler should accept tasks that use the Docker containerizer without an executor (experimental). - Jobs referencing invalid tier name will be rejected by the scheduler.
- Added a new scheduler argument
--populate_discovery_info
. If set to true, Aurora will start to populate DiscoveryInfo field on TaskInfo of Mesos. This could be used for alternative service discovery solution like Mesos-DNS. - Added support for automatic schema upgrades and downgrades when restoring a snapshot that contains a DB dump.
- Removed deprecated (now redundant) fields:
Identity.role
TaskConfig.environment
TaskConfig.jobName
TaskQuery.owner
- Removed deprecated
AddInstancesConfig
parameter toaddInstances
RPC. - Removed deprecated executor argument
-announcer-enable
, which was a no-op in 0.12.0. - Removed deprecated API constructs related to Locks:
- removed RPCs that managed locks
acquireLock
releaseLock
getLocks
- removed
Lock
parameters to RPCscreateJob
scheduleCronJob
descheduleCronJob
restartShards
killTasks
addInstances
replaceCronTemplate
- removed RPCs that managed locks
- Task ID strings are no longer prefixed by a timestamp.
- Changes to the way the scheduler reads command line arguments
- Removed support for reading command line argument values from files.
- Removed support for specifying command line argument names with fully-qualified class names.
- Upgraded Mesos to 0.25.0.
- Upgraded the scheduler ZooKeeper client from 3.3.4 to 3.4.6.
- Added support for configuring Mesos role by passing
-mesos_role
to Aurora scheduler at start time. This enables resource reservation for Aurora when running in a shared Mesos cluster. - Aurora task metadata is now mapped to Mesos task labels. Labels are prefixed with
org.apache.aurora.metadata.
to prevent clashes with other, external label sources. - Added new scheduler flag
-default_docker_parameters
to allow a cluster operator to specify a universal set of parameters that should be used for every container that does not have parameters explicitly configured at the job level. - Added support for jobs to specify arbitrary ZooKeeper paths for service registration. See here for details.
- Log destination is configurable for the thermos runner. See the configuration reference for details on how to configure destination per-process. Command line options may also be passed through the scheduler in order to configure the global default behavior.
- Env variables can be passed through to task processes by passing
--preserve_env
to thermos. - Changed scheduler logging to use logback. Operators wishing to customize logging may do so with standard logback configuration
- When using
--read-json
, aurora can now load multiple jobs from one json file, similar to the usual pystachio structure:{"jobs": [job1, job2, ...]}
. The older single-job json format is also still supported. aurora config list
command now supports--read-json
- Added scheduler command line argument
-shiro_after_auth_filter
. Optionally specify a class implementing javax.servlet.Filter that will be included in the Filter chain following the Shiro auth filters. - The
addInstances
thrift RPC does now increase job instance count (scale out) based on the task template pointed by instancekey
.
- Deprecated
AddInstancesConfig
argument inaddInstances
thrift RPC. - Deprecated
TaskQuery
argument inkillTasks
thrift RPC to disallow killing tasks across multiple roles. The new safer approach is usingJobKey
withinstances
instead. - Removed the deprecated field 'ConfigGroup.instanceIds' from the API.
- Removed the following deprecated
HealthCheckConfig
client-side configuration fields:endpoint
,expected_response
,expected_response_code
. These are now set exclusively in like-named fields ofHttpHealthChecker.
- Removed the deprecated 'JobUpdateSettings.maxWaitToInstanceRunningMs' thrift api field (
UpdateConfig.restart_threshold in client-side configuration). This aspect of job restarts is now
controlled exclusively via the client with
aurora job restart --restart-threshold=[seconds]
. - Deprecated executor flag
--announcer-enable
. Enabling the announcer previously required both flags--announcer-enable
and--announcer-ensemble
, but now only--announcer-ensemble
must be set.--announcer-enable
is a no-op flag now and will be removed in future version. - Removed scheduler command line arguments:
-enable_cors_support
. Enabling CORS is now implicit by setting the argument-enable_cors_for
.-deduplicate_snapshots
and-deflate_snapshots
. These features are good to always enable.-enable_job_updates
and-enable_job_creation
-extra_modules
-logtostderr
,-alsologtostderr
,-vlog
,-vmodule
, anduse_glog_formatter
. Removed in favor of the new logback configuration.
- Upgraded Mesos to 0.24.1.
- Added a new scheduler flag 'framework_announce_principal' to support use of authorization and rate limiting in Mesos.
- Added support for shell-based health checkers in addition to HTTP health checkers. In concert with
this change the
HealthCheckConfig
schema has been restructured to more cleanly allow configuring varied health checkers. - Added support for taking in an executor configuration in JSON via a command line argument
--custom_executor_config
which will override all other the command line arguments and default values pertaining to the executor. - Log rotation has been added to the thermos runner. See the configuration reference for details on how configure rotation per-process. Command line options may also be passed through the scheduler in order to configure the global default behavior.
- The client-side updater has been removed, along with the CLI commands that used it: 'aurora job update' and 'aurora job cancel-update'. Users are encouraged to take advantage of scheduler-driven updates (see 'aurora update -h' for usage), which has been a stable feature for several releases.
- The following fields from
HealthCheckConfig
are now deprecated:endpoint
,expected_response
,expected_response_code
in favor of setting them as part of anHttpHealthChecker.
- The field 'JobUpdateSettings.maxWaitToInstanceRunningMs' (UpdateConfig.restart_threshold in client-side configuration) is now deprecated. This setting was brittle in practice, and is ignored by the 0.11.0 scheduler.
- Upgraded Mesos to 0.23.0. NOTE: Aurora executor now requires openssl runtime dependencies that were not previously enforced. You will need libcurl available on every Mesos slave (or Docker container) to successfully launch Aurora executor. See here for more details on Mesos runtime dependencies.
- Resource quota is no longer consumed by production jobs with a dedicated constraint (AURORA-1457).
- The Python build layout has changed:
- The
apache.thermos
package has been removed. - The
apache.gen.aurora
package has been renamed toapache.aurora.thrift
. - The
apache.gen.thermos
package has been renamed toapache.thermos.thrift
. - A new
apache.thermos.runner
package has been introduced, providing thethermos_runner
binary. - A new
apache.aurora.kerberos
package has been introduced, containing the Kerberos-supporting versions ofaurora
andaurora_admin
(kaurora
andkaurora_admin
). - Most BUILD targets under
src/main
have been removed, see here for details.
- The
- Removed the
--root
option from the observer. - Thrift
ConfigGroup.instanceIds
field has been deprecated. Use ConfigGroup.instances instead. - Deprecated
SessionValidator
andCapabilityValidator
interfaces have been removed. AllSessionKey
-typed arguments are now nullable and ignored by the scheduler Thrift API.
- Now requires JRE 8 or greater.
- GC executor is fully replaced by the task state reconciliation (AURORA-1047).
- The scheduler command line argument
-enable_legacy_constraints
has been removed, and the scheduler no longer automatically injectshost
andrack
constraints for production services. (AURORA-1074) - SLA metrics for non-production jobs have been disabled by default. They can
be enabled via the scheduler command line. Metric names have changed from
...nonprod_ms
to...ms_nonprod
(AURORA-1350).
- A new command line argument was added to the observer:
--mesos-root
This must point to the same path as--work_dir
on the mesos slave. - Build targets for thermos and observer have changed, they are now:
src/main/python/apache/aurora/tools:thermos
src/main/python/apache/aurora/tools:thermos_observer