Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New public APIs for UTF8 String Literals #61847

Closed
AlekseyTs opened this issue Jun 13, 2022 · 3 comments · Fixed by #62074
Closed

New public APIs for UTF8 String Literals #61847

AlekseyTs opened this issue Jun 13, 2022 · 3 comments · Fixed by #62074
Assignees
Labels
4 - In Review A fix for the issue is submitted for review. api-approved API was approved in API review, it can be implemented Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. New Feature - Utf8StringLiterals

Comments

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jun 13, 2022

namespace Microsoft.CodeAnalysis
{
    public enum OperationKind
    {
+        /// <summary>Indicates an <see cref="IUTF8StringOperation"/>.</summary>
+        UTF8String = 0x7c,
    }
}

namespace Microsoft.CodeAnalysis.Operations
{
+    /// <summary>
+    /// Represents a UTF8 encoded byte representation of a string.
+    /// <para>
+    ///   Current usage:
+    ///   (1) C# UTF8 string literal expression.
+    /// </para>
+    /// </summary>
+    /// <remarks>
+    /// <para>This node is associated with the following operation kinds:</para>
+    /// <list type="bullet">
+    /// <item><description><see cref="OperationKind.UTF8String"/></description></item>
+    /// </list>
+    /// <para>This interface is reserved for implementation by its associated APIs. We reserve the right to
+    /// change it in the future.</para>
+    /// </remarks>
+    public interface IUTF8StringOperation : IOperation
+    {
+        /// <summary>
+        /// The underlying string value.
+        /// </summary>
+        string Value { get; }
+    }

    public abstract partial class OperationVisitor
    {
+        public virtual void VisitUTF8String(IUTF8StringOperation operation) => DefaultVisit(operation);
    }

    public abstract partial class OperationVisitor<TArgument, TResult>
    {
+        public virtual TResult? VisitUTF8String(IUTF8StringOperation operation, TArgument argument) => DefaultVisit(operation, argument);
    }
}

+Microsoft.CodeAnalysis.CSharp.SyntaxKind.UTF8StringLiteralExpression = 8756 -> Microsoft.CodeAnalysis.CSharp.SyntaxKind
+Microsoft.CodeAnalysis.CSharp.SyntaxKind.UTF8StringLiteralToken = 8520 -> Microsoft.CodeAnalysis.CSharp.SyntaxKind
+Microsoft.CodeAnalysis.CSharp.SyntaxKind.UTF8SingleLineRawStringLiteralToken = 8521 -> Microsoft.CodeAnalysis.CSharp.SyntaxKind
+Microsoft.CodeAnalysis.CSharp.SyntaxKind.UTF8MultiLineRawStringLiteralToken = 8522 -> Microsoft.CodeAnalysis.CSharp.SyntaxKind
@AlekseyTs AlekseyTs added Concept-API This issue involves adding, removing, clarification, or modification of an API. Area-Compilers New Feature - Utf8StringLiterals labels Jun 13, 2022
@AlekseyTs AlekseyTs self-assigned this Jun 13, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Issues and PRs which have not yet been triaged by a lead label Jun 13, 2022
@AlekseyTs AlekseyTs removed the untriaged Issues and PRs which have not yet been triaged by a lead label Jun 13, 2022
@AlekseyTs AlekseyTs added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jun 13, 2022
@Youssef1313
Copy link
Member

Youssef1313 commented Jun 13, 2022

Minor naming thing that you may consider discussing is UTF8 vs Utf8. The runtime used Utf8 for recent type (https://docs.microsoft.com/en-us/dotnet/api/system.text.json.utf8jsonwriter?view=net-6.0)

I'm aware that some types in .NET uses UTF all capitalized, so I referred to the more recent type.

From https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/framework-design-guidelines-digest.md#naming-guidelines

DO use PascalCasing or camelCasing for any acronyms over two
characters long. For example, use HtmlButton rather than HTMLButton, but
System.IO instead of System.Io.

@333fred
Copy link
Member

333fred commented Jun 16, 2022

API Review

  • User-facing documentation should use UTF-8
  • Could it be ILiteralOperation?
    • It's not a constant, like other literals
    • Couldn't expose the string value from the operation
  • Need to sync with the Framework Design folks on the proper spelling of Utf8 in type names (BCL is inconsistent here).

Conclusion: Approved, modulo naming of the types

@333fred 333fred added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 16, 2022
@AlekseyTs
Copy link
Contributor Author

From Framework Design folks:

From a framework naming perspective, “Utf8” is preferred, because it’s a 3 letter abbreviation (for Unicode Transformation Format), and 3+ letter abbreviations should be PascalCased, like Html and Uri.

From a documentation/conceptual perspective, the preferred spelling of the Unicode Consortium seems to be “UTF-8” (all caps, with a hyphen). To wit: https://www.unicode.org/faq/utf_bom.html

AlekseyTs added a commit to AlekseyTs/roslyn that referenced this issue Jun 22, 2022
From a framework naming perspective, “Utf8” is preferred, because it’s a 3 letter abbreviation (for Unicode Transformation Format), and 3+ letter abbreviations should be PascalCased, like Html and Uri.

From a documentation/conceptual perspective, the preferred spelling of the Unicode Consortium seems to be “UTF-8” (all caps, with a hyphen).

Closes dotnet#61847.
@AlekseyTs AlekseyTs added the 4 - In Review A fix for the issue is submitted for review. label Jun 22, 2022
AlekseyTs added a commit that referenced this issue Jun 27, 2022
…tation (#62074)

From a framework naming perspective, “Utf8” is preferred, because it’s a 3 letter abbreviation (for Unicode Transformation Format), and 3+ letter abbreviations should be PascalCased, like Html and Uri.

From a documentation/conceptual perspective, the preferred spelling of the Unicode Consortium seems to be “UTF-8” (all caps, with a hyphen).

Closes #61847.

Also fixes a typo and closes #61846.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - In Review A fix for the issue is submitted for review. api-approved API was approved in API review, it can be implemented Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. New Feature - Utf8StringLiterals
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants