Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move sdk/cognitiveservices/textanalytics to sdk/textanalytics #8944

Closed
mitchdenny opened this issue Nov 28, 2019 · 36 comments · Fixed by #9108
Closed

Move sdk/cognitiveservices/textanalytics to sdk/textanalytics #8944

mitchdenny opened this issue Nov 28, 2019 · 36 comments · Fixed by #9108
Labels
Client This issue points to a problem in the data-plane of the library. Cognitive - Text Analytics Cognitive Services

Comments

@mitchdenny
Copy link
Contributor

I noticed the new Azure.AI.TextAnalytics library was sitting in sdk/cognitiveservices/textanalytics. This may end up violating some of the assumptions we make in our engineering system about the level in source tree where our libraries sit.

Should we consider breaking our a seperate sdk/textanalytics directory and setting up seperate pipelines for this. This is especially true as the libraries that sit in the sdk/cognitiveservices path are probably going to ship on different cadences anyway and will ultimately get broken up into seperate pipelines anyway.

/cc @chidozieononiwu @weshaggard

@triage-new-issues triage-new-issues bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 28, 2019
@weshaggard
Copy link
Member

I'd be fine with that. @annelo-msft any thoughts on this?

@annelo-msft
Copy link
Member

Would the principle here be that Track 2 cognitive services libraries sit at the sdk/ level, and Track 1 sit under sdk/cognitiveservices/?

I'm not sure I understand enough about what assumptions could be violated and how pipelines are set up to make a call. Can you help me understand this better?

@mitchdenny
Copy link
Contributor Author

Our repos largely follow the pattern of sdk/[service]/[module]. So when it comes to writing automation keeping to that pattern makes things easier (e.g. finding and processing READMEs etc). I'm not sure if anything will break right now, haven't tested, but I'm trying to cut off the variance before it becomes "another pattern" :)

In terms of the track 1 to track 2 progression, we might end up splitting out all of cog services as it exists for consistency, but text analytics would obviously be the first.

@chidozieononiwu
Copy link
Member

@mitchdenny You are right the DocIndex and DocWarden process is broken on that right now

@annelo-msft
Copy link
Member

annelo-msft commented Dec 3, 2019

Cool, thanks for clarifying!

Given the way they were structured before, as a collection of services under cognitiveservices, I had understood the pattern incorrectly -- it looked like sdk/[collection-name]/[service] to me. Since the pattern is sdk/[service]/[module], for TextAnalytics, this would be sdk/textanalytics/Azure.AI.TextAnalytics. Are there any concerns about separating the cognitive services from one another? Was there automation that relied on them sitting together as a group?

@weshaggard
Copy link
Member

Are there any concerns about separating the cognitive services from one another? Was there automation that relied on them sitting together as a group?

I don't think so in fact given that they are mostly disjoint it is probably better to separate them because in our current structure our main assumption is that everything under a service folder generally ships together.

@weshaggard
Copy link
Member

Also for completeness here is a link to our repo structure docs https://github.com/Azure/azure-sdk/blob/master/docs/policies/repostructure.md.

@weshaggard
Copy link
Member

Looks like we probably should agree on the general pattern across the languages as @mssfang is adding this for Java with PR Azure/azure-sdk-for-java#6161 and we should use the same service folder naming pattern there as well.

@annelo-msft
Copy link
Member

annelo-msft commented Dec 3, 2019

Thanks! I spoke with @AlexGhiondea about this as well, and he also prefers to stick with the sdk/[service]/[module] i.e. sdk/textanalytics/Azure.AI.TextAnalytics naming convention.

@annelo-msft annelo-msft added Client This issue points to a problem in the data-plane of the library. Cognitive Services TextAnalytics labels Dec 3, 2019
@triage-new-issues triage-new-issues bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 3, 2019
@annelo-msft annelo-msft added this to the [2020] January milestone Dec 3, 2019
@weshaggard
Copy link
Member

Just for clarity the other option would be to do sdk/cognitiveservices/Azure.AI.TextAnalytics. Looks like we did that in JS, https://github.com/Azure/azure-sdk-for-js/tree/master/sdk/cognitiveservices/cognitiveservices-textanalytics and python https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/cognitiveservices/azure-cognitiveservices-language-textanalytics.

So whatever we chose I believe we should do so consistently across languages.

@annelo-msft
Copy link
Member

Given your point about things under a service folder shipping together, I think it makes sense to use sdk/textanalytics/Azure.AI.TextAnalytics -- I don't think it makes sense for the cognitive services to ship on the same cadence.

@kristapratico @mayurid @xirzec @samvaity

@mssfang
Copy link
Member

mssfang commented Dec 3, 2019

So we will remove the collection folder? If doing so, java folder will change to "sdk/textanalytics/azure-ai-textanalytics"

/cc @JonathanGiles @samvaity

@xirzec
Copy link
Member

xirzec commented Dec 3, 2019

I'm not sure if service folder == shipping cadence makes sense or not, since I can imagine shipping an update to a particular package (e.g. storage-blob or core-http) without shipping new versions of every sibling.

/cc @ramya-rao-a

@weshaggard
Copy link
Member

@xirzec we have the flexibility to ship only a subset but that is generally the exceptional case as our workflows are optimized for shipping together. If shipping the subset is the norm, like I expect in the cognitive services case we should consider splitting them up. We can split them up by having multiple pipelines but it is probably easier to split them up at service folder level. On top of shipping separately that makes things like setting up CODEOWNERS and live test runs easier as well given that the set is owned by different folks.

@xirzec
Copy link
Member

xirzec commented Dec 3, 2019

Yeah, to be fair, I think it's not unreasonable to pull Text Analytics out as Cognitive Services is a bit massive, though it would be unfortunate to not have the path match the package name anymore.

Core is an interesting edge case for us, since they are interdependent but also it's pretty common that we don't need to touch certain ones for a long time (e.g. core-paging)

@ChrisHMSFT
Copy link
Member

Are we proposing that we make this change now for all services or just for the new track2 ones? Seems like it would be confusing to have text analytics libraries show up in 2 spots but updating the existing sdks will likely break reference links in docs & on the package manager pages (last time everything went to a 404)

@annelo-msft
Copy link
Member

The proposal is to do it just for track 2 libraries.

@weshaggard
Copy link
Member

Seems like it would be confusing to have text analytics libraries show up in 2 spots but updating the existing sdks will likely break reference links in docs & on the package manager pages (last time everything went to a 404)

The link problem is a much larger issue which we have filed an issue at Azure/azure-sdk#818 to track getting into a better place. While moving might cause some existing links to break the links are likely already broken in a number of cases because they aren't linking to the correct places. Hopefully we well get into a more stable place where moving things in the repo causes issues like breaking links.

@ramya-rao-a
Copy link
Contributor

One thing to keep in mind that while we do this is whether the Track 2 library is going to use the same package name or create a new one.

If we are re-using the package name, then @chrismsft's point is doubly valid. We cannot have the same package live in 2 different places. @xirzec's point on the package name not matching the folder name is also valid.

@yangyuan
Copy link
Member

yangyuan commented Dec 5, 2019

I would vote for sdk/cognitiveservices/Azure.AI.TextAnalytics and strongly against to sdk/textanalytics/Azure.AI.TextAnalytics

  • I agree that sdk/[service]/[module] is preferred. But the logic between Text Analytics and Cognitive Services is a little similar to Azure Table and Azure Storage. I prefer to consider TextAnalytics as module instead of an individual service
    • If you are familiar with CognitiveServices, when you create a TextAnalytics resource, it is actually a CognitiveServices account with TextAnalytics "kind".
    • TextAnalytics is managed by CognitiveServices RP. There is Microsoft.Azure.Management.CognitiveServices but no Microsoft.Azure.Management.TextAnalytics
    • There are other CognitiveServices "kinds" which can call TextAnalytics service too.
  • If we are moving TA out of cognitiveservices folder, we should move all. It apparently will create a mess to put everything directly under “sdk/”

@annelo-msft
Copy link
Member

One counterpoint to that is that Cognitive Services is moving toward separating out the individual services, or at least one has to do this to enable AAD support.

It's also worth noting that we are having discussions with our architects around library naming - our guidelines state that we shouldn't use Cognitive in a namespace. If we want to do this we will need to raise this with our API review board.

@mitchdenny
Copy link
Contributor Author

mitchdenny commented Dec 6, 2019

When we think about the sdk/[something] folder we should generally be thinking about things that ship together. So its unlikely that we would ship computer vision and text analytics at the same time. But we are more likely to to ship all of the text analytics components (if there were multiple packages) at the same time.

One caveat there is that we have management plane and data plane co-mingled in the sdk/[dir] path. So insert a hand wavy justification for that here ... because storage data plane and storage management plane will tend to ship with different cadences.

The way I think of it is in dimensions. There is a logical service, which forms one dimension, and then there is another dimension which (plane) of which management and data on.

We've chosen to segment our pipelines on the service dimension. And what we are saying here is that text analytics is a logically separate service.

The dimensions I see are:

  1. Business Group
  2. Service
  3. Plane (management/data)

Storage is a good counterpoint to my proposal here since you could make the point that blob/table/queues/etc could be separated out. I think this is where the grey area is ... but the cog services space is huge and diverse and easier to separate out (and there probably isn't a lot of shared components between them other than core itself).

@ramya-rao-a
Copy link
Contributor

I would very much prefer not to make the decision on folder structure based on what ships together. Once we are out of the phase of shipping previews and go GA, it is very unlikely that the packages in a folder get shipped together.

Among what we have shipped as part of Track 2, not just storage, but keyvault also falls under the same counterpoint bucket. We have shipped 3 different packages keys, secrets and certificates. It is very much valid that we add features or make bug fixes that are applicable for 1 and not the others.

Then, we have the other scenario where some packages in a folder are GA, others in preview, but all are part of the same pipeline. While this may or may not be a problem in other languages, this has bit the JS land, where we add different tags to a published package based on whether it is preview or not. (Apologies for bringing a JS topic into a discussion in a .Net repo)

@mitchdenny
Copy link
Contributor Author

It is hard to come up with a model that suits everyone. The most flexible option is to have a pipeline per output package. So for KeyVault we'd have three or so per mono-repo.

For storage we would have a bunch more. But then you get into this situation where some libraries (like storage) have a common library and you really want to build the common library and the other libraries together. And if common changes there is a fairly good chance you would ship an update to all libraries.

KeyVault doesn't have that common library (other than core) scenario and so probably has less potential pressure for shipping together.

As I said its hard to find a perfect fit for every scenario. Another way of looking at it is what builds together. It probably makes sense that all the KeyVault stuff at least builds together. Same for storage, but its harder to make that case for the various pieces of cog services.

@JonathanGiles
Copy link
Member

JonathanGiles commented Dec 8, 2019

I do worry a little about choosing what is best from a build / release point of view, instead of what is most intuitive for visitors to our repository. Our tooling can be made better and more flexible, but users can't be expected to arrive with any such domain knowledge.

Ignoring the existence of the 'cognitiveservices' directory for track one client libraries, I would imagine that most visitors to the repo will be more inclined to want to find the totality of all cognitive services offered in the azure-sdk-for-<language>, rather than to be looking specifically through a very long directory list under /sdk for services of interest. Given that the track one 'cognitiveservices' does exist, and will do for some time, it would be my expectation that most people would go into this directory far more intuitively than they would look through the /sdk directory for /sdk/textanalytics. They would arrive at older, less high-quality libraries and potentially never go beyond this.

My position right now is to have /sdk/cognitiveservices/textanalytics, within which users would find track one and track two client libraries side by side, just as they do today for, e.g. /sdk/storage. I appreciate this is another level deeper, but to me it feels like clustering all cognitive services into a group makes sense. I'm not strongly proposing 'cognitiveservices' by the way - this is just the nomenclature used at present. For what it is worth, I don't believe this particular name would be violating the "don't use cognitive services in the namespace" requirement (for Java at least), as this is not part of the package naming.

I'm happy to be proven wrong. My main point that I felt I had to raise was that I want this considered not just from a build / release management perspective, but also from a user intuition perspective too.

@mitchdenny
Copy link
Contributor Author

mitchdenny commented Dec 8, 2019

I don't think that /sdk/cognitiveservices/textanalytics will work due to path length issues (we already hit it with just /sdk/congnitiveservices. We've taken the position that we don't want people to have to enable long path support in Windows (since its a bit spotty anyway).

(edit: when I say we've taken the position, its something that EngSys team has discussed in some details and we felt that requring long path support on Windows just adds a burden on folks before cloning).

@annelo-msft
Copy link
Member

annelo-msft commented Dec 9, 2019

I think @JonathanGiles makes a good point. That was the way I had thought about this originally, and why I checked in TextAnalytics where I did. I don't know the right answer here, but if path length is the only concern, what if we changed the name of the "cognitiveservices" folder to "ai" (in keeping with the Azure SDK guidelines around namespaces - not required, but for consistency). That would add only 3 characters to the path. Are there other engineering considerations beyond this?

There's another perspective as well, which is that all of the namespace groupings might eventually be grouped together, much like the cognitive services under something like ai. Having the cognitive services together introduces this convention, and then we have two conventions in place, which is confusing itself. The recommendation from this perspective is that we keep with the current standard sdk/textanalytics/ and if/when we eventually go over to the namespace groupings, we move cognitive to their grouping as part of that. I think the challenge here is that the track one libraries for cognitive have always been part of the second convention, so moving it in line with the first breaks what people are used to. The important question I think should be what is the right thing going forward?

@mitchdenny
Copy link
Contributor Author

Sounds like a plan. For future us, I think that there are a few aspects of this conversation to consider (in no particular order):

  • Potential path length issues
  • Taxonomy: Brand, Service Grouping, Service
  • Release cadence of the grouping we choose

@annelo-msft
Copy link
Member

annelo-msft commented Dec 10, 2019

To clarify (since I made two different points in my last comment), I checked with @mitchdenny, and he's saying "sounds like a plan" to staying with sdk/textanalytics and thinking about the listed considerations if/when we revisit this issue down the road.

With regard to the issue of splitting the Track1 and Track2 libraries, @mayurid suggested we move the TextAnalytics Track1 library to the same folder as the Track2 library. I like this suggestion, since it follows the precedent of other Track2 libraries being in the same folder as their corresponding Track1 library.

@ChrisHMSFT
Copy link
Member

if we do move the Track1 Libraries we'll need to check with the docs team. There are several links that would break. I can't remember if this type of move was what broke the ref doc generation in js earlier this year as well

@annelo-msft
Copy link
Member

I worry about moving the Track 1 library as well, because a new user coming to the repo might then never identify Text Analytics as one of the cognitive services -- this seems like a potentially negative thing for Text Analytics and any of the other services we move out. What if we kept Track 1 libraries under cognitiveservices/, moved Track 2 libraries to sdk/servicename/packagename and added a README under cognitiveservices at the same level as and alphabetized to sit beside the track 1 library to indicate that track 2 libraries are at sdk/servicename/packagename? It seems like this addresses all the immediate concerns and will let us move forward on this. Thoughts?
@mayurid @ChrisHMSFT @mitchdenny

@annelo-msft
Copy link
Member

Per our discussion in stand-up this morning, we will move TextAnalytics to sdk/textanalytics/[module]

For consistency across repos, other languages should follow this convention as well. @kristapratico, @xirzec, @samvaity, @mssfang, @mayurid

Thanks, all!

@xirzec
Copy link
Member

xirzec commented Dec 11, 2019

Already updated my PR: Azure/azure-sdk-for-js#6433

@mssfang
Copy link
Member

mssfang commented Dec 11, 2019

Java already has this change in the PR as well: Azure/azure-sdk-for-java#6161

@ChrisHMSFT
Copy link
Member

@annelo-msft just wanted to confirm would this be just for track 2 or are we moving the track1 libraries?

@annelo-msft
Copy link
Member

annelo-msft commented Dec 12, 2019

@ChrisHMSFT yes this is just for track 2 at this time. We may revisit the location of track 1 before the library GAs but we know this would cause breaking changes we would have to address, and so are not doing this at this time.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Cognitive - Text Analytics Cognitive Services
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants