-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move sdk/cognitiveservices/textanalytics to sdk/textanalytics #8944
Comments
I'd be fine with that. @annelo-msft any thoughts on this? |
Would the principle here be that Track 2 cognitive services libraries sit at the I'm not sure I understand enough about what assumptions could be violated and how pipelines are set up to make a call. Can you help me understand this better? |
Our repos largely follow the pattern of sdk/[service]/[module]. So when it comes to writing automation keeping to that pattern makes things easier (e.g. finding and processing READMEs etc). I'm not sure if anything will break right now, haven't tested, but I'm trying to cut off the variance before it becomes "another pattern" :) In terms of the track 1 to track 2 progression, we might end up splitting out all of cog services as it exists for consistency, but text analytics would obviously be the first. |
@mitchdenny You are right the DocIndex and DocWarden process is broken on that right now |
Cool, thanks for clarifying! Given the way they were structured before, as a collection of services under |
I don't think so in fact given that they are mostly disjoint it is probably better to separate them because in our current structure our main assumption is that everything under a service folder generally ships together. |
Also for completeness here is a link to our repo structure docs https://github.com/Azure/azure-sdk/blob/master/docs/policies/repostructure.md. |
Looks like we probably should agree on the general pattern across the languages as @mssfang is adding this for Java with PR Azure/azure-sdk-for-java#6161 and we should use the same service folder naming pattern there as well. |
Thanks! I spoke with @AlexGhiondea about this as well, and he also prefers to stick with the |
Just for clarity the other option would be to do So whatever we chose I believe we should do so consistently across languages. |
Given your point about things under a service folder shipping together, I think it makes sense to use |
So we will remove the collection folder? If doing so, java folder will change to "sdk/textanalytics/azure-ai-textanalytics" |
I'm not sure if service folder == shipping cadence makes sense or not, since I can imagine shipping an update to a particular package (e.g. storage-blob or core-http) without shipping new versions of every sibling. /cc @ramya-rao-a |
@xirzec we have the flexibility to ship only a subset but that is generally the exceptional case as our workflows are optimized for shipping together. If shipping the subset is the norm, like I expect in the cognitive services case we should consider splitting them up. We can split them up by having multiple pipelines but it is probably easier to split them up at service folder level. On top of shipping separately that makes things like setting up CODEOWNERS and live test runs easier as well given that the set is owned by different folks. |
Yeah, to be fair, I think it's not unreasonable to pull Text Analytics out as Cognitive Services is a bit massive, though it would be unfortunate to not have the path match the package name anymore. Core is an interesting edge case for us, since they are interdependent but also it's pretty common that we don't need to touch certain ones for a long time (e.g. core-paging) |
Are we proposing that we make this change now for all services or just for the new track2 ones? Seems like it would be confusing to have text analytics libraries show up in 2 spots but updating the existing sdks will likely break reference links in docs & on the package manager pages (last time everything went to a 404) |
The proposal is to do it just for track 2 libraries. |
The link problem is a much larger issue which we have filed an issue at Azure/azure-sdk#818 to track getting into a better place. While moving might cause some existing links to break the links are likely already broken in a number of cases because they aren't linking to the correct places. Hopefully we well get into a more stable place where moving things in the repo causes issues like breaking links. |
One thing to keep in mind that while we do this is whether the Track 2 library is going to use the same package name or create a new one. If we are re-using the package name, then @chrismsft's point is doubly valid. We cannot have the same package live in 2 different places. @xirzec's point on the package name not matching the folder name is also valid. |
I would vote for
|
One counterpoint to that is that Cognitive Services is moving toward separating out the individual services, or at least one has to do this to enable AAD support. It's also worth noting that we are having discussions with our architects around library naming - our guidelines state that we shouldn't use Cognitive in a namespace. If we want to do this we will need to raise this with our API review board. |
When we think about the One caveat there is that we have management plane and data plane co-mingled in the The way I think of it is in dimensions. There is a logical service, which forms one dimension, and then there is another dimension which (plane) of which management and data on. We've chosen to segment our pipelines on the service dimension. And what we are saying here is that text analytics is a logically separate service. The dimensions I see are:
Storage is a good counterpoint to my proposal here since you could make the point that blob/table/queues/etc could be separated out. I think this is where the grey area is ... but the cog services space is huge and diverse and easier to separate out (and there probably isn't a lot of shared components between them other than core itself). |
I would very much prefer not to make the decision on folder structure based on what ships together. Once we are out of the phase of shipping previews and go GA, it is very unlikely that the packages in a folder get shipped together. Among what we have shipped as part of Track 2, not just storage, but keyvault also falls under the same counterpoint bucket. We have shipped 3 different packages keys, secrets and certificates. It is very much valid that we add features or make bug fixes that are applicable for 1 and not the others. Then, we have the other scenario where some packages in a folder are GA, others in preview, but all are part of the same pipeline. While this may or may not be a problem in other languages, this has bit the JS land, where we add different tags to a published package based on whether it is preview or not. (Apologies for bringing a JS topic into a discussion in a .Net repo) |
It is hard to come up with a model that suits everyone. The most flexible option is to have a pipeline per output package. So for KeyVault we'd have three or so per mono-repo. For storage we would have a bunch more. But then you get into this situation where some libraries (like storage) have a common library and you really want to build the common library and the other libraries together. And if common changes there is a fairly good chance you would ship an update to all libraries. KeyVault doesn't have that common library (other than core) scenario and so probably has less potential pressure for shipping together. As I said its hard to find a perfect fit for every scenario. Another way of looking at it is what builds together. It probably makes sense that all the KeyVault stuff at least builds together. Same for storage, but its harder to make that case for the various pieces of cog services. |
I do worry a little about choosing what is best from a build / release point of view, instead of what is most intuitive for visitors to our repository. Our tooling can be made better and more flexible, but users can't be expected to arrive with any such domain knowledge. Ignoring the existence of the 'cognitiveservices' directory for track one client libraries, I would imagine that most visitors to the repo will be more inclined to want to find the totality of all cognitive services offered in the My position right now is to have /sdk/cognitiveservices/textanalytics, within which users would find track one and track two client libraries side by side, just as they do today for, e.g. /sdk/storage. I appreciate this is another level deeper, but to me it feels like clustering all cognitive services into a group makes sense. I'm not strongly proposing 'cognitiveservices' by the way - this is just the nomenclature used at present. For what it is worth, I don't believe this particular name would be violating the "don't use cognitive services in the namespace" requirement (for Java at least), as this is not part of the package naming. I'm happy to be proven wrong. My main point that I felt I had to raise was that I want this considered not just from a build / release management perspective, but also from a user intuition perspective too. |
I don't think that (edit: when I say we've taken the position, its something that EngSys team has discussed in some details and we felt that requring long path support on Windows just adds a burden on folks before cloning). |
I think @JonathanGiles makes a good point. That was the way I had thought about this originally, and why I checked in TextAnalytics where I did. I don't know the right answer here, but if path length is the only concern, what if we changed the name of the "cognitiveservices" folder to "ai" (in keeping with the Azure SDK guidelines around namespaces - not required, but for consistency). That would add only 3 characters to the path. Are there other engineering considerations beyond this? There's another perspective as well, which is that all of the namespace groupings might eventually be grouped together, much like the cognitive services under something like ai. Having the cognitive services together introduces this convention, and then we have two conventions in place, which is confusing itself. The recommendation from this perspective is that we keep with the current standard |
Sounds like a plan. For future us, I think that there are a few aspects of this conversation to consider (in no particular order):
|
To clarify (since I made two different points in my last comment), I checked with @mitchdenny, and he's saying "sounds like a plan" to staying with With regard to the issue of splitting the Track1 and Track2 libraries, @mayurid suggested we move the TextAnalytics Track1 library to the same folder as the Track2 library. I like this suggestion, since it follows the precedent of other Track2 libraries being in the same folder as their corresponding Track1 library. |
if we do move the Track1 Libraries we'll need to check with the docs team. There are several links that would break. I can't remember if this type of move was what broke the ref doc generation in js earlier this year as well |
I worry about moving the Track 1 library as well, because a new user coming to the repo might then never identify Text Analytics as one of the cognitive services -- this seems like a potentially negative thing for Text Analytics and any of the other services we move out. What if we kept Track 1 libraries under |
Per our discussion in stand-up this morning, we will move TextAnalytics to For consistency across repos, other languages should follow this convention as well. @kristapratico, @xirzec, @samvaity, @mssfang, @mayurid Thanks, all! |
Already updated my PR: Azure/azure-sdk-for-js#6433 |
Java already has this change in the PR as well: Azure/azure-sdk-for-java#6161 |
@annelo-msft just wanted to confirm would this be just for track 2 or are we moving the track1 libraries? |
@ChrisHMSFT yes this is just for track 2 at this time. We may revisit the location of track 1 before the library GAs but we know this would cause breaking changes we would have to address, and so are not doing this at this time. |
I noticed the new
Azure.AI.TextAnalytics
library was sitting insdk/cognitiveservices/textanalytics
. This may end up violating some of the assumptions we make in our engineering system about the level in source tree where our libraries sit.Should we consider breaking our a seperate
sdk/textanalytics
directory and setting up seperate pipelines for this. This is especially true as the libraries that sit in thesdk/cognitiveservices
path are probably going to ship on different cadences anyway and will ultimately get broken up into seperate pipelines anyway./cc @chidozieononiwu @weshaggard
The text was updated successfully, but these errors were encountered: