Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TextBuffer::GenHTML/RTF to read the buffer directly #16377

Merged
merged 58 commits into from
Jan 29, 2024

Conversation

tusharsnx
Copy link
Contributor

@tusharsnx tusharsnx commented Nov 26, 2023

TextBuffer::GenHTML and TextBuffer::GenRTF now read directly from the TextBuffer.

  • Since we're reading from the buffer, we can now read all the attributes saved in the buffer. Formatted copy now copies most (if not all) font/color attributes in the requested format (RTF/HTML).
  • Use TextBuffer::CopyRequest to pass all copy-related options into text generation functions as one unit.
  • Helper function TextBuffer::CopyRequest::FromConfig() generates a copy request based on Selection mode and user configuration.
  • Both formatted text generation functions now use std::string and fmt::format_to to generate the required strings. Previously, we were using std::ostringstream which is not recommended due to its potential overhead.
  • Reading attributes from ROW's attribute RLE simplified the logic as we don't have to track attribute change between the text.
  • On the caller side, we do not have to rebuild the plain text string from the vector of strings anymore. TextBuffer::GetPlainText() returns the entire text as one std::string.
  • Removed TextBuffer::TextAndColors.
  • Removed TextBuffer::GetText(). TextBuffer::GetPlainText() took its place.

This PR also fixes two bugs in the formatted copy:

  • We were applying line breaks after each selected row, even though the row could have been a Wrapped row. This caused the wrapped rows to break when they shouldn't.
  • We mishandled Unicode text (\uN) within the RTF copy. Every next character that uses a surrogate pair or high codepoint was missing in the copied text when pasted to MSWord. The command \uc4 should have been \uc1, which is used to tell how many fallback characters are used for each Unicode codepoint (\u). We always use one ? character as the fallback.

Closes #16191

References and Relevant Issues

Validation Steps Performed

  • Casual copy-pasting from Terminal or OpenConsole to word editors works as before.
  • Verified HTML copy by copying the generated HTML string and running it through an HTML viewer. Sample
  • Verified RTF copy by copy-pasting the generated RTF string into MSWord.
  • SingleLine mode works (Shift+ copy)
  • BlockSelection mode works (Alt selection)

PR Checklist

  • Tests added/passed

@microsoft-github-policy-service microsoft-github-policy-service bot added Issue-Bug It either shouldn't be doing this or needs an investigation. Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Product-Terminal The new Windows Terminal. labels Nov 26, 2023
@tusharsnx
Copy link
Contributor Author

After lhecker's recommendation and implementing the changes, it seemed to be that it was probably better to create this PR instead of pushing of #16270 mainly because it deals with Copy operations at large.

This already includes all the changes from #16270. So if you guys are okay I suggest we discard the old PR and take this one forward.

@tusharsnx tusharsnx marked this pull request as ready for review December 7, 2023 09:24
@tusharsnx
Copy link
Contributor Author

tusharsnx commented Dec 7, 2023

@lhecker

I'm writing a new tool

Performance measurement is not my strongest suit right now. I use VS profiler that comes built-in, but don't really know how to measure performance in an ideal way (e.g., comparing different implementations of the same function. If I had to do this I would probably be clicking Ctrl+Shift+A, Ctrl+C to copy all text and see how much time that took overall which is so wrong 😂). I'll be happy to try the tool you're building. Lemme know whenever that happens! 😄

And thanks for all the performance measurements you did. They're very insightful.

Edit:
Just came to know about instrumentation, and how it can be used in this situation. VS also recently shipped its instrumentation based profiler for C++: https://learn.microsoft.com/en-us/visualstudio/profiling/instrumentation-overview?view=vs-2022

@lhecker
Copy link
Member

lhecker commented Dec 11, 2023

Sorry for the long wait - the tool still had a lot of rough edges and I used it as a test bed to learn some new programming paradigms. Here it is: #16453

@tusharsnx
Copy link
Contributor Author

thanks for the tool @lhecker. it's a really good way to test performance regression precisely.

On my system, I can see these results:

openconsole (release) on main

image

openconsole (release) this PR

image

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OKAY I made it through the whole review! A couple questions, a couple notes, and a couple nits.

void CopyTextToSystemClipboard(const TextBuffer::TextAndColor& rows, _In_ const bool copyFormatting);
void CopyToSystemClipboard(std::string stringToPlaceOnClip, LPCWSTR lpszFormat);
void CopyTextToSystemClipboard(const std::wstring_view text, const std::optional<std::string> htmlData, const std::optional<std::string> rtfData) const;
void CopyToSystemClipboard(const std::string_view stringToPlaceOnClip, LPCWSTR lpszFormat) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhecker you're gonna conflict this, and because of that conflict I won't be able to easily backport to 1.19; thoughts? Want to maintain two copies of that PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to split my PR into two to adapt for this one. 🙂

{
if (!_fSelectionVisible)
{
return {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we distinguish this from there being a single selection at 0,0? do we need to?

Copy link
Contributor Author

@tusharsnx tusharsnx Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, we should return something like { { 0, 0 }, { -1, -1 } } to represent an empty selection. Fixed.

This also helped me find a bug in the possible range for CopyRequest's end coordinate. Since it's an inclusive coordinate we should be clamping it within { buffer.width - 1, buffer.height - 1 } ( and not { buffer.width, buffer.height }). This is fixed too.

Thanks.

Comment on lines 297 to 298
std::optional<std::string> html;
std::optional<std::string> rtf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder. it should be a fair assumption that if the string is empty, it didn't exist. do we need the optional?

(Not asking you to change this, just curious)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on user expectations. If I select the empty space after the prompt, I'd expect it to copy an empty string (which got trimmed) and put that into the clipboard. Also, the UI behaves like something is copied. (User has an active selection -> triggers copy action -> selection goes away -> ctrl+v -> should paste empty string). Now, if we couldn't select the empty spaces then this would be a different story 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case is probably fine even if "empty string" means "not present"! Because the framing for the HTML and RTF documents always includes data (the DOCTYPE/Clipboard header or the RTF preamble), selecting and copying the empty space after the prompt wouldn't result in an empty string.

Copy link
Contributor Author

@tusharsnx tusharsnx Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the HTML and RTF documents always includes data (the DOCTYPE/Clipboard header or the RTF preamble)

Sorry, I missed that part completely.

That means we can use an empty string as an empty RTF/HTML data. 😄

Copy link
Contributor Author

@tusharsnx tusharsnx Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except, we do return an empty string from the three text generation functions, but only when we get a CopyRequest with req.beg > req.end i.e. empty selection range (possibly unreal). Since it's an empty selection, it should be okay to return an empty string I guess.

Should I replace it with assert calls to convey that the req is ill-formed? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to remove the special case handling and let them fill the std::string with just HTML/RTF header. (Have to test whether this works fine or not)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am A-OK with us leaving the optionals, btw! It does make it a lot clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's go ahead with std::optional approach 👍

src/cascadia/TerminalCore/TerminalSelection.cpp Outdated Show resolved Hide resolved
}

const auto bgColor = _renderSettings.GetAttributeColors({}).second;
const auto isIntenseBold = _renderSettings.GetRenderMode(::Microsoft::Console::Render::RenderSettings::Mode::IntenseIsBold);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, we didn't handle this properly before did we? nice!

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved

// note: MS Word doesn't support padding (in this way at least)
htmlBuilder << "padding:";
htmlBuilder << 4; // todo: customizable padding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why on earth did we format this in as a number lol

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
};

if (rows.text.at(row).at(col) == '\r' || rows.text.at(row).at(col) == '\n')
const auto nextX = gsl::narrow_cast<uint16_t>(x + length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i forget, are attribute lengths associated with code units or with columns? Will this cause problems for e.g. single-width surrogate pair characters like U+1F574 MAN IN BUSINESS SUIT LEVITATING?

Copy link
Member

@lhecker lhecker Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes operate in columns. The ROW::_charOffsets array then maps columns back to code units.

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Dec 12, 2023
@DHowett
Copy link
Member

DHowett commented Dec 12, 2023

@tusharsnx you're taking on some of the scary parts of our codebase, and I've gotta commend you for that! Thank you!

@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Dec 13, 2023
@DHowett
Copy link
Member

DHowett commented Dec 13, 2023

(Can't leave a diff comment on mobile!)

Ah, I was thinking we should keep the case labels for the SinglyUnderlined cases but place them directly above/in the fallthrough position for default. Otherwise, static analyzers might think that we didn't cover all the cases :)

Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really see any flaws whatsoever. If there are any: Let's fix them once we ever find any! Fantastic work!

I've commented a couple nits, but they're not that important.

src/types/UiaTextRangeBase.cpp Show resolved Hide resolved
src/interactivity/win32/Clipboard.cpp Outdated Show resolved Hide resolved
src/host/selectionState.cpp Outdated Show resolved Hide resolved
@DHowett
Copy link
Member

DHowett commented Dec 15, 2023

I'll leave off my review in case you want to think about Leonard's comments! I'll be around tomorrow or so to check up. 😄

@tusharsnx
Copy link
Contributor Author

Resolved merge conflicts generated by #16480

@DHowett
Copy link
Member

DHowett commented Jan 29, 2024

I didn't realize this one didn't merge! And it's blocked on ME??

@DHowett DHowett changed the title Refactor TextBuffer::GenHTML and TextBuffer::GenRTF to read directly from the buffer Refactor TextBuffer::GenHTML/RTF to read the buffer directly Jan 29, 2024
@DHowett DHowett enabled auto-merge (squash) January 29, 2024 21:57
@DHowett DHowett merged commit a3ac337 into microsoft:main Jan 29, 2024
14 of 15 checks passed
@tusharsnx tusharsnx deleted the copypasta2 branch March 18, 2024 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-TerminalControl Issues pertaining to the terminal control (input, selection, keybindings, mouse interaction, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Font variant ignored by copyFormatting
3 participants