Skip to content
This repository has been archived by the owner on Mar 13, 2022. It is now read-only.

Leader Election issue #434 #206

Merged
merged 1 commit into from
Jan 15, 2021
Merged

Conversation

Invictus17
Copy link
Contributor

@Invictus17 Invictus17 commented Jul 31, 2020

Leader Election issue #434
kubernetes-client/python#434

@k8s-ci-robot
Copy link
Contributor

Welcome @Invictus17!

It looks like this is your first PR to kubernetes-client/python-base 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-client/python-base has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 31, 2020
@Invictus17
Copy link
Contributor Author

/assign @mbohlool

@Invictus17
Copy link
Contributor Author

Hi @mbohlool @roycaihw , can you please review my PR?

@yliaog
Copy link
Contributor

yliaog commented Aug 2, 2020

thanks for the PR.

could you please keep the file naming style consistent with the existing code? e.g. LeaderElection is better to switched to leaderelection

@Invictus17
Copy link
Contributor Author

thanks for the PR.

could you please keep the file naming style consistent with the existing code? e.g. LeaderElection is better to switched to leaderelection

Sure. I've pushed the changes.

@roycaihw
Copy link
Member

roycaihw commented Aug 3, 2020

/assign

@palnabarun
Copy link
Member

/assign

leaderelection/electionconfig.py Show resolved Hide resolved
import sys


class createConfig:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call it LeaderElectionConfig or Config


class createConfig:
# Validate config, exit if an error is detected
def __init__(self, lock, leaseDuration, renewDeadline, retryPeriod, onStartedLeading, onStoppedLeading):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/leaseDuration/lease_duration

if retryPeriod < 1:
sys.exit("retryPeriod must be greater than zero")

self.leaseDuration = leaseDuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same again for code style

leaderelection/leaderelection.py Outdated Show resolved Hide resolved
@@ -0,0 +1,31 @@
import threading
import ctypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very excited if we have to add C compatibility to achieve this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into a different way to kill threads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use stop flags and let the client check for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use stop flags and let the client check for it.

Yes, that's similar to the latest update using traces(this commit is outdated now) but I also want to look into https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor as suggested by @yliaog in her review.

self.OnStoppedLeadingThread.start()

# Start to follow
self.follow(scheduler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your implementation, a program can keep switching between being a leader and a follower. However reading the client-go implementation, when the leader fails to renew the lease, it gives up and doesn't re-join as a follower. Should we keep the client-go behavior and let the user decide what to do when a leader loses the lease?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can change the code to have a leader exit and close if failed to update lease.

def getLatestLeader(self):
getStatus, getResponse = self.electionConfig.lock.Get(name=self.electionConfig.lock.name, namespace=self.electionConfig.lock.namespace)
if getStatus:
return "leader is " + str(ast.literal_eval(getResponse.metadata.annotations[self.electionConfig.lock.LeaderElectionRecordAnnotationKey])['holderIdentity'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need literal_eval?

Copy link
Contributor Author

@Invictus17 Invictus17 Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getResponse.metadata.annotations[self.electionConfig.lock.LeaderElectionRecordAnnotationKey] returns a string.
literal_eval converts that string to a dictionary so that ['holderIdentity'] can be easily accessed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getResponse.metadata.annotations[self.electionConfig.lock.LeaderElectionRecordAnnotationKey] returns a string.
literal_eval converts that string to a dictionary so that ['holderIdentity'] can be easily accessed.

Would json.loads() work or is it not JSON?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getResponse.metadata.annotations[self.electionConfig.lock.LeaderElectionRecordAnnotationKey] returns a string.
literal_eval converts that string to a dictionary so that ['holderIdentity'] can be easily accessed.

Would json.loads() work or is it not JSON?

No, it's not JSON.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 7, 2020
@Invictus17
Copy link
Contributor Author

Hi @roycaihw , Thanks for your review. Based on your comments I've pushed the suggested changes.

changes:

  1. Coding style - Updated variable and function names.
  2. Threading with traces - This implementation does not depend on ctypes.
  3. Leader exits - The leader does not become a follower if it fails to update lease. It now exits after running onStoppedLeading().

from datetime import timedelta

# Authenticate using config file
config.load_kube_config(config_file=r"D:\Kubernetes open source - Python client\Go example\kubeconfig.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually the default config file is located at ~/.kube/config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I forgot to leave that out. Updated now.

# , if the default callback function will be used is a callback is not provide
config = electionconfig.Config(ConfigMapLock(lock_name, lock_namespace, candidate_id), lease_duration=17, renew_deadline=15, retry_period=5, onstarted_leading=example_func, onstopped_leading=None)

leaderelection.LeaderElection(config).run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually more than one is run to see how leaderelection is working. a simple README would be helpful.

Go example:
https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/client-go/examples/leader-election

lock_status, lock_response = self.election_config.lock.get(self.election_config.lock.name, self.election_config.lock.namespace)

# create a default Election record for this candidate
self.leader_election_record = self.leaderelector_record(self.election_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why creating it here instead of in the 'else:' below? it is not used in the "if lock_status:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. I'll update it in the following commit.


# If a lock is already created with that name
if lock_status:
print(self.election_config.lock.identity, "is a follower")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use 'logging' instead of 'print'



# citing source: https://www.geeksforgeeks.org/python-different-ways-to-kill-a-thread/
class thread_with_trace(threading.Thread):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be simpler to use https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor than working with threads directly.

it is added in python 3, but i guess it is ok since python 2 is deprecated anyway, you don't have to support python 2.

# Make sure thread that runs onstopped_leading callback is stopped
if self.onstopped_leadingthread:
self.onstopped_leadingthread.stop()
self.onstopped_leadingthread.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you need to 'stop', then 'join', i.e., wait for it to stop.

it's better to implement a simpler leaderelection:

  1. right at the start, try to acquire lease to be the leader
  2. if not yet a leader, periodically check if it can acquire the lease to be leader
  3. if it becomes the leader, call the hook on started_leading
  4. try to maintain the leadership by renewing the lease
  5. if fail to renew the lease, call the hook on_stopped_leading
  6. done with the leaderelection, return

NOTE: the lifecycle of one leaderelection run has two possibilities:
a) it always blocks, waiting to be a leader, but never succeeds
b) it becomes a leader, lead for sometime, then stop leading and return

It is simpler because during one leaderelection run, it can be a leader at most once. if become a leader, then somehow lose the leadership, then the whole leaderelection returns. it leave to the caller of the leaderelection to decide what to do after that, the caller may exit completely, or the caller may choose to run another leaderelection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. I should have removed this piece of code when I updated the code for a leader to not follow after losing leadership and exit. Other than that, the leader election logic is identical to what you've described. I'll also make sure that the program let's the user decide whether to exit or run for election again. I'll handle these changes in the next commit.

# Point of entry to Leader election
def run(self):
# Try to create/ acquire a lock
self.try_acquire_or_renew()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to check out the java implementation, https://github.com/kubernetes-client/java/blob/master/extended/src/main/java/io/kubernetes/client/extended/leaderelection/LeaderElector.java

It uses only three threads, or more precisely, three threadpools, but each pool has only a single thread.

from kubernetes.client.api_client import ApiClient


class ConfigMapLock:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not now, but unittests need to be added before merging the PR

self.leaderfunction_thread = None

# onstopped_leadingthread contains the thread object for onstopped_leading
self.onstopped_leadingthread = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_stopped_leading should not be run in a separate thread

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 12, 2020
@Invictus17
Copy link
Contributor Author

Hi @roycaihw @yliaog ,
Thank you for your review. I have updated my code based on your comments and discussions. The updates are:

  1. Not killing threads and making the logic simpler, similar to the initial Java client.
  2. OnStoppedLeading() is not run in a sub-thread.
  3. Added a README.
  4. Using logging and minor changes pointed out during review.


# updatelease_schedulerId variable stores the scheduler object id for the update_lease schedule that is repeated
# every retry_period seconds by the leader
self.updatelease_schedulerId = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduler_leader = sched.scheduler(time.time, time.sleep)
self.lead(scheduler_leader)

def transition_follower_to_leader(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to distinguish transition_follower_to_leader and lead, same for follow and try_acquire. These complexity might be useful if we wanted to convert a leader back to a follower



# keep checking for lease updates every retry_period seconds
self.followerlease_checkscheduler = scheduler.enter(int(self.election_config.retry_period), 1, self.check_lease_updates, (scheduler,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously I thought the scheduler was like the wait.Until method that client-go uses, where the scheduling and the real logic are isolated. I found this implementation (scheduling the real logic recursively) hard to follow. What's the benefit against using a simple while loop + sleep like the java client?

Copy link
Contributor Author

@Invictus17 Invictus17 Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it has a benefit over the other. Would you suggest using a while + sleep?

### Command to run
```python example.py```

Now kill the existing leader. You will see from the terminal outputs that one of the remaining two processes will be elected as the new leader.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you assuming 3 in total? (it says the remaining two processes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I'll update it.


# Default callback for when the current candidate if a leader, stops leading
def on_stoppedleading_callback(self):
print(self.lock.identity, "stopped leading")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logging?

:return: 'True, None' if object is created else 'False, error' if failed
"""
body = client.V1ConfigMap(
metadata={"name": name, "annotations": {self.leader_electionrecord_annotationkey: str(election_record)}}) # V1ConfigMap | Name is a necessary metadata for a configmap object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why having the comment at the end of the line before? name is required for any k8s object, it is not special for configmap

# If a lock is already created with that name
if lock_status:
logging.info("{} is a follower".format(self.election_config.lock.identity))
scheduler_follower = sched.scheduler(time.time, time.sleep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a total of 4 of these: sched.scheduler(time.time, time.sleep) in the code. would one scheduler be sufficient? would it be better to create just one in the class init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a total of 4 of these: sched.scheduler(time.time, time.sleep) in the code. would one scheduler be sufficient? would it be better to create just one in the class init?

Yes, one would do, I'll update it. @roycaihw had a different opinion about using the sched module. Do you have similar thoughts about it?
#206 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i agree with his comment. i think it's easier to understand to have the program structure like this:

  1. try to acquire lease to be leader in main thread
  2. if get lease, then run lead in a separate thread
  3. keep renewing lease in main thread
  4. if failed renewing lease, then stop leading in main thread

@codecov-commenter
Copy link

Codecov Report

Merging #206 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #206   +/-   ##
=======================================
  Coverage   92.37%   92.37%           
=======================================
  Files          13       13           
  Lines        1613     1613           
=======================================
  Hits         1490     1490           
  Misses        123      123           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54d188f...9fa58e0. Read the comment docs.

@Invictus17
Copy link
Contributor Author

Hi @roycaihw @yliaog ,
Thank you for your review. I have updated my code based on your comments and discussions. The updates are:

  1. Switched from using sched module to a while + sleep approach.
  2. Made the code consistent with the other clients.

@Invictus17 Invictus17 requested a review from yliaog August 24, 2020 01:56

# If a lock is already created with that name
if lock_status:
old_election_record = ast.literal_eval(lock_response.metadata.annotations[self.election_config.lock.leader_electionrecord_annotationkey])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock_response needs to be validated, i.e. it may not have annotations, the annotation may not have the key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also ast.literal_eval could throw an exception, better to catch it



# If This candidate is not the leader and lease duration is yet to finish
if str(self.election_config.lock.identity) != self.observed_record['holderIdentity'] and self.observed_time_milliseconds + self.election_config.lease_duration*1000 > int(time.time()*1000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.observed_time_milliseconds is from this candidate, but it should be from the election record, no?

also self.observed_time_milliseconds is updated to the current time at line 117, so it will always be the current time, and the lease duration will never expire

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.observed_time_milliseconds is from this candidate, but it should be from the election record, no?
No, the java and go clients are also keeping a local record of 'observed_time_milliseconds' and not referring to the election record
https://github.com/kubernetes-client/java/blob/master/extended/src/main/java/io/kubernetes/client/extended/leaderelection/LeaderElector.java#L269-L315

also self.observed_time_milliseconds is updated to the current time at line 117, so it will always be the current time, and the lease duration will never expire
It is only updated if a follower identifies that the lock object has been updated by a leader.
if old_election_record != self.observed_record:
# Update self.observed_time_milliseconds & self.observed_record

In case a leader fails to update the lock, self.observed_time_milliseconds will not be updated to current time & after a period of 'leaseDuration' a follower will try to update the lock.
https://github.com/kubernetes-client/java/blob/master/extended/src/main/java/io/kubernetes/client/extended/leaderelection/LeaderElector.java#L267-L279

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, the "> int(time.time()*1000):" in the above line is incorrect. time.time() would return the current time, what should be used instead is the time at the start of the function try_acquire_or_renew (https://github.com/kubernetes-client/java/blob/6a2a60ad2ad75fb127874797ea910b90e4a80651/extended/src/main/java/io/kubernetes/client/extended/leaderelection/LeaderElector.java#L236)

# If this candidate is the Leader
if str(self.election_config.lock.identity) == self.observed_record['holderIdentity']:
# Leader sets acquireTime
leader_election_record['acquireTime'] = self.observed_record['acquireTime']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.observed_record['acquireTime'] is not updated, it is still the old_election_record's acquireTime

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is correct I think. It prevents leader_election_record['acquireTime'] to be "now"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. i was thinking about setting the 'acquireTime' at the time when the lease is acquired, currently, the 'acquireTime' is set at the line: 104: leader_election_record = self.leaderelector_record(self.election_config)

it probably does not matter though, as there is no wait or loop inside the function try_acquire_or_renew

return True

# A lock is not created with that name, try to create one
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check the api exception. We can create a lock only if it's a 404

leader_election_record = self.leaderelector_record(self.election_config)

# If a lock is already created with that name
if lock_status:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse the condition to have less indentation in the code


if old_election_record != self.observed_record:
self.observed_record = old_election_record
self.observed_time_milliseconds = int(time.time()*1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture the current time (e.g. now = time.time()) at the beginning of this function, and keep using now, instead of calling time.time() multiple times.

# If this candidate is the Leader
if str(self.election_config.lock.identity) == self.observed_record['holderIdentity']:
# Leader sets acquireTime
leader_election_record['acquireTime'] = self.observed_record['acquireTime']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is correct I think. It prevents leader_election_record['acquireTime'] to be "now"


# Update object with latest election record
lock_response.metadata.annotations[self.election_config.lock.leader_electionrecord_annotationkey] = str(leader_election_record)
update_status, update_response = self.election_config.lock.update(self.election_config.lock.name, self.election_config.lock.namespace, lock_response)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not generalized for different types of locks. Better to make leader_election_record a class or a dict, and pass it directly to the lock methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the leaderelection logic doesn't need to know about LeaderElectionRecordAnnotationKey. It's a implementation detail for the lock


# If this candidate is the Leader
if str(self.election_config.lock.identity) == self.observed_record['holderIdentity']:
# Leader sets acquireTime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Leader updates renewTime, but keeps acquireTime unchanged" may be more accurate here

election_record=leader_election_record)

if create_status is False:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a log for the create failure


return self.update_lock(lock_response, leader_election_record)

def update_lock(self, lock_response, leader_election_record):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock_response can be a field stored in the lock, since it was returned from lock.get() and never changed before we passed it back to lock.update(). See client-go as an example.

Comment on lines 129 to 144
# A lock exists with that name
# Validate lock_record
if lock_record is None:
# try to update lock with proper annotation and election record
return self.update_lock(lock_response, leader_election_record)

# check for any key, value errors in the record
try:
old_election_record = ast.literal_eval(lock_record)
if (old_election_record['holderIdentity'] == '' or old_election_record['leaseDurationSeconds'] == ''
or old_election_record['acquireTime'] == '' or old_election_record['renewTime'] == ''):
# try to update lock with proper annotation and election record
return self.update_lock(lock_response, leader_election_record)
except:
# try to update lock with proper annotation and election record
return self.update_lock(lock_response, leader_election_record)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to assume lock_record is a class object, instead of a string here. Each lock type can have its own unmarshalling implementation (e.g. ConfigMap and Endpoints unmarshal their annotations, while Lease unmarshalls its spec).

This will also allow the short-circuit logic here to be as simple as client-java.

Copy link
Member

@roycaihw roycaihw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for generalizing the leader_election_record. Overall the implementation looks good to me. Please add a test.

leaderelection/resourcelock/configmaplock.py Show resolved Hide resolved

# A lock is not created with that name, try to create one
if not lock_status:
if ast.literal_eval(lock_response.body)['code'] != 404:
logging.info("Error retrieving resource lock {} as {}".format(self.election_config.lock.name, lock_response.reason))
if json.loads(old_election_record.body)['code'] != 404:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid magic constant 404, instead use NOT_FOUND

logging.info("{} successfully acquired lease".format(self.election_config.lock.identity))

# Start leading and call OnStartedLeading()
threading.Thread(target=self.election_config.onstarted_leading, daemon=True).start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some documentation to the class to describe the threading behavior. i.e. the thread is running until onstarted_leading returns if it does return, it's not safe to run this leader election more than once in a process, etc.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 25, 2020
@codecov-io
Copy link

Codecov Report

Merging #206 (8793d25) into master (54d188f) will decrease coverage by 0.08%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #206      +/-   ##
==========================================
- Coverage   92.37%   92.29%   -0.09%     
==========================================
  Files          13       13              
  Lines        1613     1635      +22     
==========================================
+ Hits         1490     1509      +19     
- Misses        123      126       +3     
Impacted Files Coverage Δ
config/kube_config_test.py 95.60% <0.00%> (-0.28%) ⬇️
config/kube_config.py 83.40% <0.00%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54d188f...8793d25. Read the comment docs.

@Invictus17
Copy link
Contributor Author

@yliaog @roycaihw In my latest push I've added the tests but the travis ci report is indicating a few failures. Any idea why some of these tests are failing?
test result: https://travis-ci.org/github/kubernetes-client/python-base/builds/751412124

@yliaog
Copy link
Contributor

yliaog commented Jan 6, 2021

#222 fixed the configmap test failure

onstopped_leading=on_stopped_leading_A)

# Enter leader election
leaderelection.LeaderElection(config_A).run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since run() blocks until the leader election ends, B won't start until A hits renew_count_max. Could you make the two clients run in parallel to make sure the leader election handles concurrency properly?

"try update record",
"update record",
"try update record",
"try update record"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some trouble understanding what this test (test_Leader_election_with_renew_deadline ) does. Could you add some comments to explain how it's related to renew_deadline and what's the expected behavior? Thanks

Copy link
Contributor Author

@Invictus17 Invictus17 Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @roycaihw, sure I'll add some comments. The expected behavior is to check if the leader stops leading if it fails to update the lock within the renew_deadline.

on update: zzz s
on try update: 3s
on update: zzz s
on try update: 4.5s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this try, the timeout was set to be 4.5 + renew_deadline = 6.5s. After two failed tries (w/ sleep), the leader exited at 4.5+1.5*2=7.5s.

@roycaihw
Copy link
Member

LGTM. Please squash the commits into reasonable parts.

self.name = name
self.namespace = namespace
self.identity = str(identity)
self.lock = threading.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this lock per MockResourceLock? I'd expect it to be shared between all MockResourceLocks in one test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it should be shared. I'll update it. Thanks!

changed file naming style consistent with the existing go client code

Update example.py

Changed file and folder names

Rename LeaderElection.py to leaderelection.py

Rename threadingWithException.py to threadingwithexception.py

Rename ConfigMapLock.py to configmaplock.py

LeaderElection to leaderelection

Added boiler plate headers, updated variable and function names consistent with the guidelines, removed the ctypes dependency by using traces to kill threads, changed logic for leader now it gives up and doesn't re-join as a follower if it fails to update lease

added correct boiler plate year

Rename threadingWithTrace.py to threadingwithtrace.py

Update leaderelection.py

Update example.py

Changes based on review - logging, OnStoppedLeading is not killed abruptly, OnStartedLeading is not run in a separate thread, adding README

Update example.py

updated comments

set threads as daemon

Update README.md

Code made consistent with other clients.

Update example.py

Update leaderelection.py

Error & exception handling for the annotation, reduced indentation

Adding serializing functions for serializing & de-serializing locks, leader_election_record as a class

Adding a test

Adding boilerplate header

Rename leaderelectiontest.py to leaderelection_test.py

Updated boiler plates

handling imports for pytest

handling 'HTTP not found' compatibility with python 2 & 3, & handling relative imports

Update leaderelection.py

to check tests for tox

assertEquals -> assertEqual

Update leaderelection_test.py

making Threading compatible for Python 2

changing datetime.timestamp for backward compatibility with Python 2.7

Adding comments for test_Leader_election_with_renew_deadline & making
candidates run in parallel for test_leader_election

remove redundant daemon = True reassignment

common thread lock for MockResourceLock
@roycaihw
Copy link
Member

/lgtm
/approve

Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 15, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Invictus17, roycaihw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 15, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4bf72d7 into kubernetes-client:master Jan 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants