-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use full TextAttribute
when copying data into Clipboard
#16270
Conversation
@@ -96,7 +96,7 @@ Clipboard& Clipboard::Instance() | |||
// - cchData - Size of the Unicode String in characters | |||
// Return Value: | |||
// - None | |||
void Clipboard::StringPaste(_In_reads_(cchData) const wchar_t* const pData, | |||
void Clipboard::StringPaste(_In_reads_(cchData) PCWCHAR pData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this codebase, unless we're interfacing with a Windows API we have a negative preference for the windows typedefs. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, ok, will revert this.
The declaration is little confusing though. It uses PCWCHAR
, so I thought let's make it consistent. Maybe, we should change PCWCHAR
to const wchar_t* const
in the declaration too?
terminal/src/interactivity/win32/clipboard.hpp
Lines 33 to 34 in 59dcbbe
void StringPaste(_In_reads_(cchData) PCWCHAR pwchData, | |
const size_t cchData); |
src/buffer/out/textBuffer.cpp
Outdated
std::vector<COLORREF> selectionFgAttr; | ||
std::vector<COLORREF> selectionBkAttr; | ||
selectionText.reserve(maxCells); | ||
til::small_rle<TextAttribute> attrs{ maxCells, it->TextAttr() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I am OK with using something memory-inefficient (like a vector for each row with width
capacity, or even a huge height
*width
vector with sub-spans for each row), because this is a short-lived transport format rather than a long-lived permanent storage format.
If RLE is slower in this use case, we can spend memory to save time by not using it. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I know that til::rle has useful iteration behavior that specifically helps text attribute run production. That is, you can iterate attr-change-by-attr-change... which could be good for HTML/RTF generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If RLE is slower in this use case, we can spend memory to save time by not using it
I don't think RLE is any slower than vector type in this case 😅
you can iterate attr-change-by-attr-change... which could be good for HTML/RTF generation.
I totally agree. Though, I wasn't planning to make this change in this PR, but since you mentioned it, I'll see if that can be done in this PR itself. I'll add this to the PR checklist for now 👍
Relatedly, you might be interested in this code I'm currently working on: terminal/src/buffer/out/textBuffer.cpp Lines 2976 to 3175 in b70dcde
That's how I serialize a TextBuffer to text with VT sequences. It iterates over the backing run length compressed attributes directly and doesn't depend on any of the unwieldy cell iterators. This has the benefit of being massively faster and counter-intuitively being fairly concise code. I believe the function could be made to also produce plain text, HTML or RTF with a couple nested if conditions and switch statements. I think this PR is fine as is though. ^ Just some inspiration. 🙂 |
Exactly what I was thinking of, thanks 😀 |
we were adding newlines after every row! Causing wrapped rows to break unexpectedly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one minor issue (textTrimmedLength
) and a nit (the slice()
usage).
const auto begin = text.rbegin(); | ||
const auto end = text.rend(); | ||
auto it = begin; | ||
for (; it != end; ++it) | ||
{ | ||
const auto chars = cell.Chars(); | ||
selectionText.append(chars); | ||
|
||
if (copyTextColor) | ||
if (*it != L' ') | ||
{ | ||
const auto cellData = cell.TextAttr(); | ||
const auto [CellFgAttr, CellBkAttr] = GetAttributeColors(cellData); | ||
for (size_t j = 0; j < chars.size(); ++j) | ||
{ | ||
selectionFgAttr.push_back(CellFgAttr); | ||
selectionBkAttr.push_back(CellBkAttr); | ||
} | ||
break; | ||
} | ||
} | ||
|
||
++it; | ||
const auto textTrimmedLength = gsl::narrow_cast<uint16_t>(end - it); | ||
colEnd = std::min(colEnd, textTrimmedLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the textTrimmedLength
code is correct... end - it
returns the number of trailing whitespaces so colEnd
should be row.size() - (end - it)
, no? Also, we can simplify this using either find_last_not_of(L' ')
or ROW::MeasureRight()
. That is:
colEnd = std::min(colEnd, row.MeasureRight());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row.size() - (end - it)
That would be correct if were to use .begin()
/.end()
, but here we are using .rbegin()
/.rend()
, isn't?
Also, we can simplify this using either find_last_not_of(L' ') or ROW::MeasureRight()
I faced some problem when I tried to use ROW::MeasureRight()
for getting the trimmed length. IIRC, the row with the cursor was always _wrapForced = true
, which made MeasureRight()
return _columnCount
. However, it seems to work now so I guess I'll update this code to use MeasureRight()
🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to mention that MeasureRight()
is better in the sense that it doesn't trim trailing whitespaces of wrapped rows. This is because, technically, the trailing whitespaces on a wrapped row are not considered trailing whitespace of the line that was wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, end - it
returns the number of characters from the start to the iterator. But that's also wrong, since there isn't a 1:1 mapping between the number of characters and columns.
I'm glad that MeasureRight()
works for you now!
// We apply formatting to rows if the row was NOT wrapped or formatting of wrapped rows is allowed | ||
const auto shouldFormatRow = formatWrappedRows || !GetRowByOffset(iRow).WasWrapForced(); | ||
// save selected text | ||
selectionText = std::wstring{ row.GetText(colBegin, colEnd) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note, BTW: This text copying is among others a reason for why I really want to remove TextAndAttribute
entirely. It'd be much more efficient but just as simple to accumulate the HTML/RTF data right here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean to say accumulate HTML/RTF data for this line, I guess we can do this in these steps:
- Generate HTML/RTF header data.
- Generate HTML/RTF attr-run-by-attr-run by calling
Gen[HTML|RTF]
in a loop.Gen[HTML|RTF]
would then only be responsible for this span of text (or an RTF group in case of RTF). We'll pass a single attribute which needs to be applied for this run of text. - Generate HTML/RTF footer data.
- Final result would be Header + All text spans + Footer.
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is an approach like the one I linked above in this comment: #16270 (comment)
It accumulates the text directly into std::wstring& out
. (FYI it uses an out parameter because that's the only way the caller can control both, the console lock duration, and the batch size for WriteFile
calls. For functions like GenHTML
and GenRTF
, I'd just return the string
.)
I'd basically merge GetText
into each GenHTML
and GenRTF
which then do their own text- and attribute-processing directly on the ROW
s. Only once they're standalone functions without any external helpers, I'd start abstracting smaller bits away as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.. I'm implementing this, just letting you know that we will be calling RenderSettings::GetAttributeColors
twice now, each in GenHTML
and GenRTF
. Would this be a concern?
I'm also looking into having one external helper where we would be calculating the exact text span we're interested in from the selection region. This is what will be responsible for trimming trailing whitespaces. We do this once and pass the result into GenHTML
and GenRTF
.
{ | ||
if (shouldFormatRow) | ||
attrs = std::move(row.Attributes().slice(colBegin, colEnd)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to move return values, since C++11 and later have RVO - return value optimization. (Also, they're in the so called prvalue category (p = pure) and everything in the more general rvalue category behaves as if it's being moved.)
In fact, this code is a tiny bit dangerous, because if slice()
was to return a mutable reference, you might steal the contents from the ROW
instance.
Lastly, I think you can move the attrs
variable from the other scope in here. Since slice()
returns a new instance every time there's no way to reuse memory here anyways.
const int fontHeightPoints, | ||
const std::wstring_view fontFaceName, | ||
const COLORREF backgroundColor) | ||
const COLORREF backgroundColor, | ||
const bool isIntenseBold) | ||
{ | ||
try | ||
{ | ||
std::ostringstream htmlBuilder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: If you ever want to, on my end, you're more than welcome to replace the std::ostringstream
with a std::string
and use regular .append()
or +=
and fmt::format_to(FMT_COMPILE(...), ...)
instead. 😄 sstream
classes have nasty overhead, both for binary size and runtime performance. And they aren't even particularly powerful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see if we can take the alternative approach we're discussing above. If not, I'll come back to this 🙂
Closing in favor of #16377 |
`TextBuffer::GenHTML` and `TextBuffer::GenRTF` now read directly from the TextBuffer. - Since we're reading from the buffer, we can now read _all_ the attributes saved in the buffer. Formatted copy now copies most (if not all) font/color attributes in the requested format (RTF/HTML). - Use `TextBuffer::CopyRequest` to pass all copy-related options into text generation functions as one unit. - Helper function `TextBuffer::CopyRequest::FromConfig()` generates a copy request based on Selection mode and user configuration. - Both formatted text generation functions now use `std::string` and `fmt::format_to` to generate the required strings. Previously, we were using `std::ostringstream` which is not recommended due to its potential overhead. - Reading attributes from `ROW`'s attribute RLE simplified the logic as we don't have to track attribute change between the text. - On the caller side, we do not have to rebuild the plain text string from the vector of strings anymore. `TextBuffer::GetPlainText()` returns the entire text as one `std::string`. - Removed `TextBuffer::TextAndColors`. - Removed `TextBuffer::GetText()`. `TextBuffer::GetPlainText()` took its place. This PR also fixes two bugs in the formatted copy: - We were applying line breaks after each selected row, even though the row could have been a Wrapped row. This caused the wrapped rows to break when they shouldn't. - We mishandled Unicode text (\uN) within the RTF copy. Every next character that uses a surrogate pair or high codepoint was missing in the copied text when pasted to MSWord. The command `\uc4` should have been `\uc1`, which is used to tell how many fallback characters are used for each Unicode codepoint (\u). We always use one `?` character as the fallback. Closes #16191 **References and Relevant Issues** - #16270 **Validation Steps Performed** - Casual copy-pasting from Terminal or OpenConsole to word editors works as before. - Verified HTML copy by copying the generated HTML string and running it through an HTML viewer. [Sample](https://codepen.io/tusharvickey/pen/wvNXbVN) - Verified RTF copy by copy-pasting the generated RTF string into MSWord. - SingleLine mode works (<kbd>Shift</kbd>+ copy) - BlockSelection mode works (<kbd>Alt</kbd> selection)
This PR replaces
TextBuffer::TextAndColor
withTextBuffer::TextAndAttribute
which gives us the opportunity to use all of the text attribute we save within buffer while copying data into the clipboard.We'll use
til::small_rle
for storing the attributes for each row in the selection rect. This helps in minimizing the memory required for storing all the attribute of a row.This PR also fixes two bugs in the formatted copy:
TextBuffer
row, even though the row could have been a wrapped row. This caused wrapped rows to break when they shouldn't. Fix: we now add a newline (<BR>
in HTML,\line
in RTF) only when we see\r
or\n
in the text.\uN
) within RTF copy. Every next character that uses a surrogate pair or high codepoint was missing in the copied text when pasted to Word. Fix: command\uc4
should have been\uc1
, which is used to tell how many characters will be used as a fallback for a unicode character. We always use 1?
character as the fallback (and not 4, as was the case).Validation Steps Performed
echo "This is some Ascii \ {}`nLow code units: á é í ó ú `u{2b81} `u{2b82}`nHigh code units: `u{a7b5} `u{a7b7}`nSurrogates: `u{1f366} `u{1f47e} `u{1f440}"
PR Checklist
GenRTF
andGenHTML
to loop usingTextAttrbute
RLE runs.GetText
and the stuff we do withTextAndColor
/TextAndAttribute
.GenHTML
andGenRTF
read from the buffer directly.