Move sdk/cognitiveservices/textanalytics to sdk/textanalytics #8944

mitchdenny · 2019-11-28T23:27:36Z

I noticed the new Azure.AI.TextAnalytics library was sitting in sdk/cognitiveservices/textanalytics. This may end up violating some of the assumptions we make in our engineering system about the level in source tree where our libraries sit.

Should we consider breaking our a seperate sdk/textanalytics directory and setting up seperate pipelines for this. This is especially true as the libraries that sit in the sdk/cognitiveservices path are probably going to ship on different cadences anyway and will ultimately get broken up into seperate pipelines anyway.

/cc @chidozieononiwu @weshaggard

The text was updated successfully, but these errors were encountered:

weshaggard · 2019-12-02T21:57:08Z

I'd be fine with that. @annelo-msft any thoughts on this?

annelo-msft · 2019-12-02T22:07:46Z

Would the principle here be that Track 2 cognitive services libraries sit at the sdk/ level, and Track 1 sit under sdk/cognitiveservices/?

I'm not sure I understand enough about what assumptions could be violated and how pipelines are set up to make a call. Can you help me understand this better?

mitchdenny · 2019-12-03T02:29:37Z

Our repos largely follow the pattern of sdk/[service]/[module]. So when it comes to writing automation keeping to that pattern makes things easier (e.g. finding and processing READMEs etc). I'm not sure if anything will break right now, haven't tested, but I'm trying to cut off the variance before it becomes "another pattern" :)

In terms of the track 1 to track 2 progression, we might end up splitting out all of cog services as it exists for consistency, but text analytics would obviously be the first.

chidozieononiwu · 2019-12-03T09:58:25Z

@mitchdenny You are right the DocIndex and DocWarden process is broken on that right now

annelo-msft · 2019-12-03T12:54:26Z

Cool, thanks for clarifying!

Given the way they were structured before, as a collection of services under cognitiveservices, I had understood the pattern incorrectly -- it looked like sdk/[collection-name]/[service] to me. Since the pattern is sdk/[service]/[module], for TextAnalytics, this would be sdk/textanalytics/Azure.AI.TextAnalytics. Are there any concerns about separating the cognitive services from one another? Was there automation that relied on them sitting together as a group?

weshaggard · 2019-12-03T18:38:12Z

Are there any concerns about separating the cognitive services from one another? Was there automation that relied on them sitting together as a group?

I don't think so in fact given that they are mostly disjoint it is probably better to separate them because in our current structure our main assumption is that everything under a service folder generally ships together.

weshaggard · 2019-12-03T18:38:47Z

Also for completeness here is a link to our repo structure docs https://github.com/Azure/azure-sdk/blob/master/docs/policies/repostructure.md.

weshaggard · 2019-12-03T18:50:52Z

Looks like we probably should agree on the general pattern across the languages as @mssfang is adding this for Java with PR Azure/azure-sdk-for-java#6161 and we should use the same service folder naming pattern there as well.

annelo-msft · 2019-12-03T18:52:01Z

Thanks! I spoke with @AlexGhiondea about this as well, and he also prefers to stick with the sdk/[service]/[module] i.e. sdk/textanalytics/Azure.AI.TextAnalytics naming convention.

weshaggard · 2019-12-03T18:56:54Z

Just for clarity the other option would be to do sdk/cognitiveservices/Azure.AI.TextAnalytics. Looks like we did that in JS, https://github.com/Azure/azure-sdk-for-js/tree/master/sdk/cognitiveservices/cognitiveservices-textanalytics and python https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/cognitiveservices/azure-cognitiveservices-language-textanalytics.

So whatever we chose I believe we should do so consistently across languages.

annelo-msft · 2019-12-03T19:01:12Z

Given your point about things under a service folder shipping together, I think it makes sense to use sdk/textanalytics/Azure.AI.TextAnalytics -- I don't think it makes sense for the cognitive services to ship on the same cadence.

@kristapratico @mayurid @xirzec @samvaity

mssfang · 2019-12-03T19:15:36Z

So we will remove the collection folder? If doing so, java folder will change to "sdk/textanalytics/azure-ai-textanalytics"

/cc @JonathanGiles @samvaity

xirzec · 2019-12-03T19:19:46Z

I'm not sure if service folder == shipping cadence makes sense or not, since I can imagine shipping an update to a particular package (e.g. storage-blob or core-http) without shipping new versions of every sibling.

/cc @ramya-rao-a

weshaggard · 2019-12-03T19:27:42Z

@xirzec we have the flexibility to ship only a subset but that is generally the exceptional case as our workflows are optimized for shipping together. If shipping the subset is the norm, like I expect in the cognitive services case we should consider splitting them up. We can split them up by having multiple pipelines but it is probably easier to split them up at service folder level. On top of shipping separately that makes things like setting up CODEOWNERS and live test runs easier as well given that the set is owned by different folks.

xirzec · 2019-12-03T21:39:12Z

Yeah, to be fair, I think it's not unreasonable to pull Text Analytics out as Cognitive Services is a bit massive, though it would be unfortunate to not have the path match the package name anymore.

Core is an interesting edge case for us, since they are interdependent but also it's pretty common that we don't need to touch certain ones for a long time (e.g. core-paging)

ChrisHMSFT · 2019-12-05T19:15:50Z

Are we proposing that we make this change now for all services or just for the new track2 ones? Seems like it would be confusing to have text analytics libraries show up in 2 spots but updating the existing sdks will likely break reference links in docs & on the package manager pages (last time everything went to a 404)

annelo-msft · 2019-12-05T19:48:07Z

The proposal is to do it just for track 2 libraries.

weshaggard · 2019-12-05T21:24:30Z

Seems like it would be confusing to have text analytics libraries show up in 2 spots but updating the existing sdks will likely break reference links in docs & on the package manager pages (last time everything went to a 404)

The link problem is a much larger issue which we have filed an issue at Azure/azure-sdk#818 to track getting into a better place. While moving might cause some existing links to break the links are likely already broken in a number of cases because they aren't linking to the correct places. Hopefully we well get into a more stable place where moving things in the repo causes issues like breaking links.

ramya-rao-a · 2019-12-05T21:29:53Z

One thing to keep in mind that while we do this is whether the Track 2 library is going to use the same package name or create a new one.

If we are re-using the package name, then @chrismsft's point is doubly valid. We cannot have the same package live in 2 different places. @xirzec's point on the package name not matching the folder name is also valid.

yangyuan · 2019-12-05T22:30:45Z

I would vote for sdk/cognitiveservices/Azure.AI.TextAnalytics and strongly against to sdk/textanalytics/Azure.AI.TextAnalytics

I agree that sdk/[service]/[module] is preferred. But the logic between Text Analytics and Cognitive Services is a little similar to Azure Table and Azure Storage. I prefer to consider TextAnalytics as module instead of an individual service
- If you are familiar with CognitiveServices, when you create a TextAnalytics resource, it is actually a CognitiveServices account with TextAnalytics "kind".
- TextAnalytics is managed by CognitiveServices RP. There is Microsoft.Azure.Management.CognitiveServices but no Microsoft.Azure.Management.TextAnalytics
- There are other CognitiveServices "kinds" which can call TextAnalytics service too.
If we are moving TA out of cognitiveservices folder, we should move all. It apparently will create a mess to put everything directly under “sdk/”

annelo-msft · 2019-12-05T23:56:02Z

One counterpoint to that is that Cognitive Services is moving toward separating out the individual services, or at least one has to do this to enable AAD support.

It's also worth noting that we are having discussions with our architects around library naming - our guidelines state that we shouldn't use Cognitive in a namespace. If we want to do this we will need to raise this with our API review board.

mitchdenny · 2019-12-06T00:22:16Z

When we think about the sdk/[something] folder we should generally be thinking about things that ship together. So its unlikely that we would ship computer vision and text analytics at the same time. But we are more likely to to ship all of the text analytics components (if there were multiple packages) at the same time.

One caveat there is that we have management plane and data plane co-mingled in the sdk/[dir] path. So insert a hand wavy justification for that here ... because storage data plane and storage management plane will tend to ship with different cadences.

The way I think of it is in dimensions. There is a logical service, which forms one dimension, and then there is another dimension which (plane) of which management and data on.

We've chosen to segment our pipelines on the service dimension. And what we are saying here is that text analytics is a logically separate service.

The dimensions I see are:

Business Group
Service
Plane (management/data)

Storage is a good counterpoint to my proposal here since you could make the point that blob/table/queues/etc could be separated out. I think this is where the grey area is ... but the cog services space is huge and diverse and easier to separate out (and there probably isn't a lot of shared components between them other than core itself).

ramya-rao-a · 2019-12-06T01:40:47Z

I would very much prefer not to make the decision on folder structure based on what ships together. Once we are out of the phase of shipping previews and go GA, it is very unlikely that the packages in a folder get shipped together.

Among what we have shipped as part of Track 2, not just storage, but keyvault also falls under the same counterpoint bucket. We have shipped 3 different packages keys, secrets and certificates. It is very much valid that we add features or make bug fixes that are applicable for 1 and not the others.

Then, we have the other scenario where some packages in a folder are GA, others in preview, but all are part of the same pipeline. While this may or may not be a problem in other languages, this has bit the JS land, where we add different tags to a published package based on whether it is preview or not. (Apologies for bringing a JS topic into a discussion in a .Net repo)

mitchdenny · 2019-12-06T02:08:55Z

It is hard to come up with a model that suits everyone. The most flexible option is to have a pipeline per output package. So for KeyVault we'd have three or so per mono-repo.

For storage we would have a bunch more. But then you get into this situation where some libraries (like storage) have a common library and you really want to build the common library and the other libraries together. And if common changes there is a fairly good chance you would ship an update to all libraries.

KeyVault doesn't have that common library (other than core) scenario and so probably has less potential pressure for shipping together.

As I said its hard to find a perfect fit for every scenario. Another way of looking at it is what builds together. It probably makes sense that all the KeyVault stuff at least builds together. Same for storage, but its harder to make that case for the various pieces of cog services.

JonathanGiles · 2019-12-08T21:01:42Z

I do worry a little about choosing what is best from a build / release point of view, instead of what is most intuitive for visitors to our repository. Our tooling can be made better and more flexible, but users can't be expected to arrive with any such domain knowledge.

Ignoring the existence of the 'cognitiveservices' directory for track one client libraries, I would imagine that most visitors to the repo will be more inclined to want to find the totality of all cognitive services offered in the azure-sdk-for-<language>, rather than to be looking specifically through a very long directory list under /sdk for services of interest. Given that the track one 'cognitiveservices' does exist, and will do for some time, it would be my expectation that most people would go into this directory far more intuitively than they would look through the /sdk directory for /sdk/textanalytics. They would arrive at older, less high-quality libraries and potentially never go beyond this.

My position right now is to have /sdk/cognitiveservices/textanalytics, within which users would find track one and track two client libraries side by side, just as they do today for, e.g. /sdk/storage. I appreciate this is another level deeper, but to me it feels like clustering all cognitive services into a group makes sense. I'm not strongly proposing 'cognitiveservices' by the way - this is just the nomenclature used at present. For what it is worth, I don't believe this particular name would be violating the "don't use cognitive services in the namespace" requirement (for Java at least), as this is not part of the package naming.

I'm happy to be proven wrong. My main point that I felt I had to raise was that I want this considered not just from a build / release management perspective, but also from a user intuition perspective too.

mitchdenny · 2019-12-08T22:37:56Z

I don't think that /sdk/cognitiveservices/textanalytics will work due to path length issues (we already hit it with just /sdk/congnitiveservices. We've taken the position that we don't want people to have to enable long path support in Windows (since its a bit spotty anyway).

(edit: when I say we've taken the position, its something that EngSys team has discussed in some details and we felt that requring long path support on Windows just adds a burden on folks before cloning).

annelo-msft · 2019-12-09T16:33:07Z

I think @JonathanGiles makes a good point. That was the way I had thought about this originally, and why I checked in TextAnalytics where I did. I don't know the right answer here, but if path length is the only concern, what if we changed the name of the "cognitiveservices" folder to "ai" (in keeping with the Azure SDK guidelines around namespaces - not required, but for consistency). That would add only 3 characters to the path. Are there other engineering considerations beyond this?

There's another perspective as well, which is that all of the namespace groupings might eventually be grouped together, much like the cognitive services under something like ai. Having the cognitive services together introduces this convention, and then we have two conventions in place, which is confusing itself. The recommendation from this perspective is that we keep with the current standard sdk/textanalytics/ and if/when we eventually go over to the namespace groupings, we move cognitive to their grouping as part of that. I think the challenge here is that the track one libraries for cognitive have always been part of the second convention, so moving it in line with the first breaks what people are used to. The important question I think should be what is the right thing going forward?

mitchdenny · 2019-12-10T01:48:01Z

Sounds like a plan. For future us, I think that there are a few aspects of this conversation to consider (in no particular order):

Potential path length issues
Taxonomy: Brand, Service Grouping, Service
Release cadence of the grouping we choose

annelo-msft · 2019-12-10T04:19:01Z

To clarify (since I made two different points in my last comment), I checked with @mitchdenny, and he's saying "sounds like a plan" to staying with sdk/textanalytics and thinking about the listed considerations if/when we revisit this issue down the road.

With regard to the issue of splitting the Track1 and Track2 libraries, @mayurid suggested we move the TextAnalytics Track1 library to the same folder as the Track2 library. I like this suggestion, since it follows the precedent of other Track2 libraries being in the same folder as their corresponding Track1 library.

ChrisHMSFT · 2019-12-10T16:14:05Z

if we do move the Track1 Libraries we'll need to check with the docs team. There are several links that would break. I can't remember if this type of move was what broke the ref doc generation in js earlier this year as well

annelo-msft · 2019-12-11T16:59:41Z

I worry about moving the Track 1 library as well, because a new user coming to the repo might then never identify Text Analytics as one of the cognitive services -- this seems like a potentially negative thing for Text Analytics and any of the other services we move out. What if we kept Track 1 libraries under cognitiveservices/, moved Track 2 libraries to sdk/servicename/packagename and added a README under cognitiveservices at the same level as and alphabetized to sit beside the track 1 library to indicate that track 2 libraries are at sdk/servicename/packagename? It seems like this addresses all the immediate concerns and will let us move forward on this. Thoughts?
@mayurid @ChrisHMSFT @mitchdenny

annelo-msft · 2019-12-11T22:10:17Z

Per our discussion in stand-up this morning, we will move TextAnalytics to sdk/textanalytics/[module]

For consistency across repos, other languages should follow this convention as well. @kristapratico, @xirzec, @samvaity, @mssfang, @mayurid

Thanks, all!

xirzec · 2019-12-11T22:12:47Z

Already updated my PR: Azure/azure-sdk-for-js#6433

mssfang · 2019-12-11T22:48:10Z

Java already has this change in the PR as well: Azure/azure-sdk-for-java#6161

ChrisHMSFT · 2019-12-12T14:21:14Z

@annelo-msft just wanted to confirm would this be just for track 2 or are we moving the track1 libraries?

annelo-msft · 2019-12-12T16:47:54Z

@ChrisHMSFT yes this is just for track 2 at this time. We may revisit the location of track 1 before the library GAs but we know this would cause breaking changes we would have to address, and so are not doing this at this time.

triage-new-issues bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Nov 28, 2019

weshaggard mentioned this issue Dec 3, 2019

Text analytics Preview 1 Azure/azure-sdk-for-java#6161

Merged

annelo-msft added Client This issue points to a problem in the data-plane of the library. Cognitive Services TextAnalytics labels Dec 3, 2019

triage-new-issues bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Dec 3, 2019

annelo-msft added this to the [2020] January milestone Dec 3, 2019

kristapratico mentioned this issue Dec 3, 2019

Text Analytics v3 Implementation Azure/azure-sdk-for-python#8582

Closed

32 tasks

kristapratico mentioned this issue Dec 4, 2019

[cognitiveservices-textanalytics] Move sdk/cognitiveservices/textanalytics to sdk/textanalytics Azure/azure-sdk-for-python#8993

Closed

3 tasks

samvaity mentioned this issue Dec 6, 2019

Update package name for Text Analytics Azure/azure-sdk-for-java#6686

Closed

annelo-msft mentioned this issue Dec 12, 2019

Move textanalytics folder up a level in the azure-sdk-for-net repo #9108

Merged

annelo-msft closed this as completed in #9108 Dec 12, 2019

annelo-msft mentioned this issue Mar 20, 2020

Form Recognizer initial repo setup Azure/azure-sdk-for-java#9322

Merged

ghost added Cognitive - Text Analytics and removed TextAnalytics labels Jun 24, 2020

github-actions bot locked and limited conversation to collaborators Mar 29, 2023

Move sdk/cognitiveservices/textanalytics to sdk/textanalytics #8944

Move sdk/cognitiveservices/textanalytics to sdk/textanalytics #8944

Comments

mitchdenny commented Nov 28, 2019

weshaggard commented Dec 2, 2019

annelo-msft commented Dec 2, 2019

mitchdenny commented Dec 3, 2019

chidozieononiwu commented Dec 3, 2019

annelo-msft commented Dec 3, 2019 • edited Loading

weshaggard commented Dec 3, 2019

weshaggard commented Dec 3, 2019

weshaggard commented Dec 3, 2019

annelo-msft commented Dec 3, 2019 • edited Loading

weshaggard commented Dec 3, 2019

annelo-msft commented Dec 3, 2019

mssfang commented Dec 3, 2019 • edited Loading

xirzec commented Dec 3, 2019

weshaggard commented Dec 3, 2019

xirzec commented Dec 3, 2019

ChrisHMSFT commented Dec 5, 2019

annelo-msft commented Dec 5, 2019

weshaggard commented Dec 5, 2019

ramya-rao-a commented Dec 5, 2019

yangyuan commented Dec 5, 2019 • edited Loading

annelo-msft commented Dec 5, 2019

mitchdenny commented Dec 6, 2019 • edited Loading

ramya-rao-a commented Dec 6, 2019

mitchdenny commented Dec 6, 2019

JonathanGiles commented Dec 8, 2019 • edited Loading

mitchdenny commented Dec 8, 2019 • edited Loading

annelo-msft commented Dec 9, 2019 • edited Loading

mitchdenny commented Dec 10, 2019

annelo-msft commented Dec 10, 2019 • edited Loading

ChrisHMSFT commented Dec 10, 2019

annelo-msft commented Dec 11, 2019

annelo-msft commented Dec 11, 2019

xirzec commented Dec 11, 2019

mssfang commented Dec 11, 2019

ChrisHMSFT commented Dec 12, 2019

annelo-msft commented Dec 12, 2019 • edited Loading

annelo-msft commented Dec 3, 2019 •

edited

Loading

annelo-msft commented Dec 3, 2019 •

edited

Loading

mssfang commented Dec 3, 2019 •

edited

Loading

yangyuan commented Dec 5, 2019 •

edited

Loading

mitchdenny commented Dec 6, 2019 •

edited

Loading

JonathanGiles commented Dec 8, 2019 •

edited

Loading

mitchdenny commented Dec 8, 2019 •

edited

Loading

annelo-msft commented Dec 9, 2019 •

edited

Loading

annelo-msft commented Dec 10, 2019 •

edited

Loading

annelo-msft commented Dec 12, 2019 •

edited

Loading