-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds proposal for auto partitioning in LSO #486
Conversation
dbf2c5e
to
0623a2d
Compare
/assign @jsafrane |
For performance reasons the user will want to use partitions. | ||
On NVMe devices, some of the devices are capable of delevering very high IOPS per device. An OSD tends to bottleneck long before it reaches anywhere close to the device IOPS limit. | ||
So with single OSD per NVMe device, there will be a big gap between measured or advertised capabilities of these devices and what you can get with Ceph/OCS on top of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is an "OSD"? Why does the LSO or even OpenShift care about these things?
What I'm saying is that you need to phrase this in a way that presents a broader use-case that is not specific to OCS.
- Policy for wiping device to clean the partitions | ||
- Mixed method of PV creation is not supported. i.e., using some devices entirely and some with partitoins in a single LocalVolumeSet is not supported. | ||
|
||
### Risks and Mitigations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the point of the risks section. You are pointing out that we might partition the wrong disk if the user gives the wrong device filter? This risk doesn't seem any different from risks of the LocalVolumeSet already has, so maybe we can just refer back to that doc if it has a risks section.
// TotalProvisionedPartitionCount is the count of the total partitions over which the PVs has been provisioned | ||
// This will be diffrent for TotalProvisionedDeviceCount as TotalProvisionedDeviceCount is | ||
//the count of the actual devices | ||
TotalProvisionedPartitionCount *int32 `json:"totalProvisionedPartitionCount,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this be updated? If each node updates the status, we could get conflicts. Was totalProvisionedDeviceCount
implemented? It would have the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalProvisionedDeviceCount
is implemented, looking into the implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, updating this across daemon restart may be complicated.
I would ask maybe a stupid question: why is this value important? How is it different to an user if LSO matches a disk with pre-existing paritionions and only consumes them, to LSO actually partitioning empty disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will remove this part.
|
||
As per the above example the administrator wants partitions of size 30G. | ||
Lets assume that there are 5 devices of 100G that are detected and are fit for partitoning, so each device will be partitioned into 3 parts of 30G each and 10G of each device will remain unpartitioned. | ||
Right now we are keeping it simple by partitioning by size and will move to advanced partitioning policy in the further releases to avoid having unusable disk space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think the admins would most often carve up the devices in sizes that make sense for their disks, so this may not even be a common issue. Sounds good for a future release though if it's common enough.
0623a2d
to
6bba608
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rohan47 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
type PartitioningSpec struct { | ||
// PartSize determines the exact size of each partition. | ||
PartSize *resource.Quantity `json:"partSize"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we need to have a Count
as well. It looks like a waste to use the entire device all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to partition the disk, seems like you'd want to partition it all at the same time since the unpartitioned space wouldn't be usable. There could certainly be other options to indicate how partitions should be created, but I want to be clear about the scenarios to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to partition the disk, seems like you'd want to partition it all at the same time since the unpartitioned space wouldn't be usable. There could certainly be other options to indicate how partitions should be created, but I want to be clear about the scenarios to keep it simple.
@travisn
I didn't understand by partition it all at the same time.
Do you suggest that we should not keep count for the current scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of count
as it is also a simple parameter to partition the disks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the rationale is why should we use the entire device? If we device is an NVMe it could certainly be used by other applications. I don't think we (can?) restrict disks just for Ceph.
So if we decide that Ceph needs 3 partitions of 40GiB (making this up) then the rest can still be used by other applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of the partitioning will be that we have a PV created on each partition. We don't know who the consumer is, it might be Ceph or it might be some other app, depending on who requests the volumes. So why wouldn't we partition the whole disk? If you're partitioning it for PVs, do we need to allow apps to use it for non-PVs? I didn't think so, but deciding not to partition the whole device can certainly be part of the policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the parameters, is this the suggested policy in this thread?
- If only
size
is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount. - If both
count
andsize
are specified, only create that number of partitions and leave the remaining disk unpartitioned.
That seems reasonable and simple to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the parameters, is this the suggested policy in this thread?
* If only `size` is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount. * If both `count` and `size` are specified, only create that number of partitions and leave the remaining disk unpartitioned.
My understanding of count
is to create count
no of partitions i.e., Size of the entire device
/count
for example we have a device of total capacity of 128G and the value of count
is 4, then 4 partitions of 32G each will be created.
That seems reasonable and simple to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was stuck thinking about the size as being a required setting. What about this?
- If only
size
is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount. - If only
count
is specified, divide the disk up into that many partitions and calculate the size. - If both
count
andsize
are specified, only create that number of partitions of the desired size and leave the remaining disk unpartitioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
6bba608
to
47ef026
Compare
47ef026
to
5b9f181
Compare
5b9f181
to
20797ee
Compare
|
||
### Workflow for Local PV creation via `LocalVolumeSet CR` | ||
- The admin decides they want to create partitions on some devices in the cluster | ||
- He creates a LocalVolumeSet CR that includes the partitioning policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably: LocalVolumeSet matching the disk they want to partition.
// TotalProvisionedPartitionCount is the count of the total partitions over which the PVs has been provisioned | ||
// This will be diffrent for TotalProvisionedDeviceCount as TotalProvisionedDeviceCount is | ||
//the count of the actual devices | ||
TotalProvisionedPartitionCount *int32 `json:"totalProvisionedPartitionCount,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, updating this across daemon restart may be complicated.
I would ask maybe a stupid question: why is this value important? How is it different to an user if LSO matches a disk with pre-existing paritionions and only consumes them, to LSO actually partitioning empty disk?
|
||
## Partitioning policy | ||
|
||
* Size: If only size is specified, create partitions for the whole disk all of that size. If there is space remaining, create a PV for the remaining amount. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it consistent with "If both count and size are specified" - only create partitions of the desired size and leave the remaining disk unpartitioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean if only size
is specified then create partitions of desired size and leave the remaining disk unpartitioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
bb80b8e
to
38c19c8
Compare
Signed-off-by: rohan47 <[email protected]>
38c19c8
to
51bf7d5
Compare
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@rohan47: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR adds a design proposal for the automatic partitioning of local devices and provisioning PVs for those partitions in LSO.
Signed-off-by: rohan47 [email protected]