Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying bullets from a Word Doc doesn't create bulleted list #1225

Open
arwagner opened this issue Dec 27, 2016 · 30 comments
Open

Copying bullets from a Word Doc doesn't create bulleted list #1225

arwagner opened this issue Dec 27, 2016 · 30 comments

Comments

@arwagner
Copy link

Copying and pasting unordered bullets from Word puts a bullet symbol in the editor instead of an actual bullet.

Steps for Reproduction

  1. Open https://www.dropbox.com/s/61gwc7evz398xki/test.docx?dl=0 in Word
  2. Select all in Word, and copy
  3. Visit http://quilljs.com/playground/#autosave
  4. Click into the editor and paste
  5. Click on the word "One"
  6. Click on the "unordered bullets" icon in the toolbar of the editor

Expected behavior:
The bullet gets removed

Actual behavior:
A real bullet gets created, containing the bullet symbol from the clipboard

Platforms:

All

Version:
All

@arwagner
Copy link
Author

I'd love to get some guidance on a proper approach to fixing this issue.

I've been playing around with creating a custom matcher for clipboard to do this. The matcher essentially ignores any "p.MsoListParagraphCxSpMiddle" or "p.MsoListParagraphCxSpLast" tags (returns a Delta that doesn't do anything), and, for any "p.MsoListParagraphCxSpFirst" iterates through that tag, and its siblings, until it finds the "...SpLast" tag.

But, from there, I'm not sure exactly what the right thing to do is. The deltas that you get from creating a bullet list manually in quill are kind of strange, and I'm not sure that the matcher should be creating them from scratch? Should it be creating deltas from a List blot? I'm a bit confused as to whether or not I'm even on the right track.

@jhchen
Copy link
Member

jhchen commented Dec 29, 2016

How you looked at how the officially supported matchers in the clipboard work? It uses its own API the same way a third party would. If so what are specific things you have tried and have not gotten to work?

@arwagner
Copy link
Author

Yes, I've looked at the built-in matchers. http://codepen.io/anon/pen/ENqRdP is what I have so far, but I'm not sure what should go in the "addNodeToDelta" function. None of the built-in matchers quite seem to do what I'm trying to do here, unless I'm misunderstanding them.

@jhchen
Copy link
Member

jhchen commented Dec 29, 2016

The purpose of a matcher is to return a Delta representing a given node. If you fulfill this contract, the clipboard can build a Delta for the entire pasted tree. By traversing siblings and attempting to return Deltas for them instead, you are not fulfilling this contract. I would also suggest taking a look at Delta documentation. One of the more important takeaways from the Delta docs is not to create them by hand.

@arwagner
Copy link
Author

Yes, I did read the Delta documentation. But I think, in this case, what I want to do is to construct a single delta with a List blot embedded, which corresponds to all the paragraphs that correspond to bullets. Is that not correct? You say that the contract is a one-to-one correspondence between deltas and nodes, yet there are a number of times in https://github.com/quilljs/quill/blob/develop/modules/clipboard.js where previousSibling, nextSibling, etc. are called. What I can't find an example of is a matcher which results in a particular blot.

@jhchen
Copy link
Member

jhchen commented Dec 30, 2016

return { ops: [] } is constructing a Delta by hand. When Quill's clipboard is using sibling, it does so for context about the current Delta.

@jhchen jhchen closed this as completed Dec 30, 2016
@jhchen jhchen changed the title Copying and pasting bullets from a Word Doc doesn't create norma bullets in quill Copying bullets from a Dropbox rendered Word Doc doesn't create bulleted list Dec 30, 2016
@jhchen jhchen changed the title Copying bullets from a Dropbox rendered Word Doc doesn't create bulleted list Copying bullets from a Word Doc doesn't create bulleted list Dec 30, 2016
@jhchen jhchen reopened this Jan 1, 2017
@DavidReinberger
Copy link

DavidReinberger commented Apr 4, 2017

@arwagner did you had any luck with this issue?

@DavidReinberger
Copy link

DavidReinberger commented Apr 10, 2017

So after a few hours of works, I have a solution. IMHO it is probably not the most elegant, but it works for unordered lists pasted from MS Word. Unfortunately, it does not work for ordered lists (any hints why), the implementation seems the same as for unordered lists.

const MSwordMatcher = function (node, delta) {

  const _build = [];

    while (true) {

        if (node) {

            if (node.tagName === 'P') {

                const content = node.querySelectorAll('span'); //[0] index contains bullet or numbers, [1] index contains spaces, [2] index contains item content
                const _nodeText = content[2].innerText.trim();
                //const _listType = content[0].innerText.match(/[0-9]/g) ? 'ordered' : 'bullet'; //@TODO: implement ordered lists

                _build.push({ insert: `${_nodeText}\n`, attributes: { 'bullet': true } });

                if (node.className === 'MsoListParagraphCxSpLast') {
                    break;
                }

            }
        }

        node = node.nextSibling;

    }

    return new Delta(_build);

};
const matcherNoop = (node, delta) => ({ ops: [] });

While initing quill

modules: {
        clipboard: {
            matchers: [
                ['p.MsoListParagraphCxSpFirst', MSwordMatcher],
                ['p.MsoListParagraphCxSpMiddle', matcherNoop],
                ['p.MsoListParagraphCxSpLast', matcherNoop],
            ]
        },
}

ping @arwagner (if you are still interested)

@SamDuvall
Copy link

I tried to take the example from @DavidReinberger and apply the feedback from @jhchen on this issue. I wanted to preserve bullet vs ordered, indentation as well as allow HTML within each list item. Any feedback / suggestions are welcome.

Note: I am using underscore in the below code, but that could be removed.

const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
];

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = _.map(delta.ops, _.clone);

  // Trim the front of the first op to remove the bullet/number
  let first = _.first(ops);
  first.insert = first.insert.trimLeft();
  let firstMatch = first.insert.match(/^(\S+)\s+/);
  if (!firstMatch) return delta;
  first.insert = first.insert.substring(firstMatch[0].length, first.insert.length);

  // Trim the newline off the last op
  let last = _.last(ops);
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let prefix = firstMatch[1];
  let listType = prefix.match(/\S+\./) ? 'ordered' : 'bullet';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '')
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}

@Subtletree
Copy link

Thanks @SamDuvall, your matchers are working flawlessly for me.

@Azuf
Copy link

Azuf commented Dec 20, 2020

Note: I am using underscore in the below code, but that could be removed.

@SamDuvall what's that underscore and what do you mean it can be removed? 😶

@Subtletree
Copy link

Subtletree commented Dec 20, 2020

@Azuf He's talking about underscore.js library

Here's a vanilla version

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let first = ops[0];
  first.insert = first.insert.trimLeft();
  let firstMatch = first.insert.match(/^(\S+)\s+/);
  if (!firstMatch) return delta;
  first.insert = first.insert.substring(firstMatch[0].length, first.insert.length);

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let prefix = firstMatch[1];
  let listType = prefix.match(/\S+\./) ? 'ordered' : 'bullet';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '')
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}

@darshakeyan
Copy link

Copying and pasting unordered bullets from Word puts a bullet symbol in the editor instead of an actual bullet.

Steps for Reproduction

  1. Open https://www.dropbox.com/s/61gwc7evz398xki/test.docx?dl=0 in Word
  2. Select all in Word, and copy
  3. Visit http://quilljs.com/playground/#autosave
  4. Click into the editor and paste
  5. Click on the word "One"
  6. Click on the "unordered bullets" icon in the toolbar of the editor

Expected behavior: The bullet gets removed

Actual behavior: A real bullet gets created, containing the bullet symbol from the clipboard

Platforms:

All

Version: All

Have you solve this issue ? @arwagner
can you please help I have to delivery ASAP to customer but i could not able to find the solution anywhere ?

@Subtletree
Copy link

@darshak369 there are a couple of solutions listed above in this issue

@darshakeyan
Copy link

@darshak369 there are a couple of solutions listed above in this issue

Thanks for reply @Subtletree I have tried all of them not working anything for me
If you can please give the idea what to do in this causing the formatting issue only on MS word desktop app only

@Subtletree
Copy link

Subtletree commented Dec 9, 2021

@darshak369 Hmm sounds frustrating that they are not working!

The following is working for me:

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let bulletOp = ops.find((op) => op.insert && op.insert.trim().length);
  if (!bulletOp) { return delta }

  bulletOp.insert = bulletOp.insert.trimLeft();
  let listPrefix = bulletOp.insert.match(/^.*(^·|\.)/) || bulletOp.insert[0];
  bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length);

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let listType = listPrefix[0].length === 1 ? 'bullet' : 'ordered';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '');
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}


const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
  ['p.msolistparagraph', matchMsWordList]
];


// When instantiating a quill editor
let quill = new Quill('#editor', {
  modules: {
    clipboard: { matchers: MSWORD_MATCHERS }
  }
});

When writing this up I found a couple of edge cases that didn't work, so the above should now work for lists with only one bullet and won't strip the first word from each bullet in some cases.

Word
image

Pasted into quill
image

@darshakeyan
Copy link

darshakeyan commented Dec 10, 2021

@darshak369 Hmm sounds frustrating that they are not working!

The following is working for me:

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let bulletOp = ops.find((op) => op.insert && op.insert.trim().length);
  if (!bulletOp) { return delta }

  bulletOp.insert = bulletOp.insert.trimLeft();
  let listPrefix = bulletOp.insert.match(/^.*(^·|\.)/) || bulletOp.insert[0];
  bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length);

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let listType = listPrefix[0].length === 1 ? 'bullet' : 'ordered';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '');
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}


const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
  ['p.msolistparagraph', matchMsWordList]
];


// When instantiating a quill editor
let quill = new Quill('#editor', {
  modules: {
    clipboard: { matchers: MSWORD_MATCHERS }
  }
});

When writing this up I found a couple of edge cases that didn't work, so the above should now work for lists with only one bullet and won't strip the first word from each bullet in some cases.

Word image

Pasted into quill image

Thanks for your solution @Subtletree Its means a lot.

I had tried this solution in quill playground but unfortunately its not working...

This is playground code which I have copy and paste similar to what you have mention above. considering the screen-sorts you have mentioned it seems like solution is absolutely correct. and its working fine from your side.

https://codepen.io/darshak434/pen/GRMjvwr

The MS word file which I am copying content :

image

You can find file here - https://1drv.ms/w/s!AtzwzPKX4hPigSpGIzLT2ezREQiL?e=cUkyth

Open with MS word desktop app and copy the content and paste to the above quill editor.

After copy and paste this word content I am getting following result -

image

can you please share with me all specification you were using like name of the version, operating system and all. so that I can able to understand was it happening to my system only.

Thanks

@Subtletree
Copy link

Looks like I don't have permissions to download the word doc. I tested from another word doc into the codepen and it worked ok on:

Windows 10 21H1
Office 365 Word 2111
Chrome 96.0.4664.93
Firefox 95

image

Wonder if it's to do with the specific type of bullets or something, let me know when you've changed those permissions and I'll try with your doc!

@darshakeyan
Copy link

darshakeyan commented Dec 13, 2021

Looks like I don't have permissions to download the word doc. I tested from another word doc into the codepen and it worked ok on:

Windows 10 21H1 Office 365 Word 2111 Chrome 96.0.4664.93 Firefox 95

image

Wonder if it's to do with the specific type of bullets or something, let me know when you've changed those permissions and I'll try with your doc!

Hey @Subtletree

Here is the link of doc file you can download directly going to the link -

https://drive.google.com/drive/folders/1txcKIDmrT6tjerPrqy_8THbSaETHrj0f?usp=sharing

Here is the case - I have tested from another new word doc by writing the bullets points and its working fine for me as well.
yet if we copy content from doc provided by customer to quill Its not formatted in same manner.

you can check I have share the doc file to you.

Thanks

@Subtletree
Copy link

Subtletree commented Dec 13, 2021

Looks like those bullets are nested as a p.MsoNormal class for some reason instead of p.MsoListParagraph etc.

The following works but I haven't done heaps of testing with it. It's possibly quite brittle e.g with a non standard bullet (like arrows) in a p.MsoNormal, the list won't be detected.

const Delta = Quill.import('delta');

function matchMsWordList(node, delta) {
  // Clone the operations
  let ops = delta.ops.map((op) => Object.assign({}, op));

  // Trim the front of the first op to remove the bullet/number
  let bulletOp = ops.find((op) => op.insert && op.insert.trim().length);
  if (!bulletOp) { return delta }

  bulletOp.insert = bulletOp.insert.trimLeft();
  let listPrefix = bulletOp.insert.match(/^.*?(^·|\.)/) || bulletOp.insert[0];
  bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length).trimLeft();

  // Trim the newline off the last op
  let last = ops[ops.length-1];
  last.insert = last.insert.substring(0, last.insert.length - 1);

  // Determine the list type
  let listType = listPrefix[0].length === 1 ? 'bullet' : 'ordered';

  // Determine the list indent
  let style = node.getAttribute('style').replace(/\n+/g, '');
  let levelMatch = style.match(/level(\d+)/);
  let indent = levelMatch ? levelMatch[1] - 1 : 0;

  // Add the list attribute
  ops.push({insert: '\n', attributes: {list: listType, indent}})

  return new Delta(ops);
}

function maybeMatchMsWordList(node, delta) {
  if (delta.ops[0].insert.trimLeft()[0] === '·') {
    return matchMsWordList(node, delta);
  }
  
  return delta;
}

const MSWORD_MATCHERS = [
  ['p.MsoListParagraphCxSpFirst', matchMsWordList],
  ['p.MsoListParagraphCxSpMiddle', matchMsWordList],
  ['p.MsoListParagraphCxSpLast', matchMsWordList],
  ['p.MsoListParagraph', matchMsWordList],
  ['p.msolistparagraph', matchMsWordList],
  ['p.MsoNormal', maybeMatchMsWordList]
];

// When instantiating a quill editor
let quill = new Quill('#editor', {
  modules: {
    clipboard: { matchers: MSWORD_MATCHERS }
  },
  placeholder: 'Compose an epic...',
  theme: 'snow'
});

@darshakeyan
Copy link

darshakeyan commented Dec 13, 2021

MsoListParagraphCxSpLast

Thanks @Subtletree for your effort and time
Its working fine but as you said it is quite brittle such as spaces before the bullets paragraph , not retain spaces between lines etc. you can find this document to the same link - https://drive.google.com/drive/folders/1txcKIDmrT6tjerPrqy_8THbSaETHrj0f?usp=sharing

image

Is there any solution to retain that as well like similar to p.MsNormal you did in above function ?

It will be very helpful to my customer.

@Subtletree
Copy link

@darshak369 I've edited my last comment to add a trimLeft on this line:
bulletOp.insert = bulletOp.insert.substring(listPrefix[0].length, bulletOp.insert.length).trimLeft();
Which should fix the spacing at the start (but will also strip any intentional spacing at the start)

The paragraph spacing issue has nothing to do with the bullets really. I think quill doesn't handle before and after paragraph spacing so would need to be handled in a custom way. If you just added a new line after each paragraph instead of using paragraph spacing it would work fine but hard to tell your customer that 😅

@darshakeyan
Copy link

Thank you very much for this solution @Subtletree I am very glad. everything is work as expected and it means a lot. 👍

The paragraph spacing issue is not a big issue that should be fine without it. but yes it definitely hard to tell customer 😂

@Subtletree
Copy link

Very welcome @darshak369! I've updated our code to use the new changes so has helped me too.

@berott
Copy link

berott commented Dec 24, 2021

Thank you very much for your work!
If I paste my word-list to https://codepen.io/darshak434/pen/GRMjvwr?editors=1111 I get the correct result and the correct p-classes.
<p class="MsoListParagraphCxSpMiddle" style="margin: 0cm 0cm 0cm 216pt; font-size: 12pt; font-family: Calibri, sans-serif; text-indent: -18pt;"><span style="font-size: 24pt; font-family: Wingdings;">§<span style="font-variant-numeric: normal; font-variant-east-asian: normal; font-stretch: normal; font-size: 7pt; line-height: normal; font-family: &quot;Times New Roman&quot;;"> </span></span><span style="font-size: 24pt;">Value&nbsp;&nbsp;&nbsp;&nbsp;<o:p></o:p></span></p>

In my context (with ngx-quill) I get the following node, if i do a console.log(node); in my matcher-methode:
<p><span style="font-size:24.0pt;font-family:Wingdings;mso-fareast-font-family:Wingdings; mso-bidi-font-family:Wingdings"><span style="mso-list:Ignore">§<span style="font:7.0pt &quot;Times New Roman&quot;"> </span></span></span><span style="font-size:24.0pt">Value<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp; </span></span></p>

What can be the reason for this difference? I paste the same content but I get different nodes (and therefor different deltas) in the matcher-methods?

@abhinavprasad98
Copy link

Hi @Subtletree I am new to angular. Can you please help me where to paste the piece of code which you have shared and how to make it work?. I tried pasting it in the app.component.ts where I have written the code for quill functionality by changing a few things to adapt it to .ts like adding this. and removing const.

It gave me zero errors and compiled it, but still, it is just pasting the bulletins from MS Word with &nbsp without adding

  • Please help me out. It is critical for me.

@abhinavprasad98
Copy link

Hi @darshak369. I am trying to implement the Quill Editor in Angular. I have implemented Quill editor in the App component itself. I have copied the code by @Subtletree to the app.component.ts and made necessary changes to suit TypeScript.

It is getting complied successfully, but the issue of ordered/unordered list getting created when pasting bulletins from MS Word still exists. Need your help on how to make this work please. It's very critical for my work.

@davidwintermeyer
Copy link

Hi there,

I know I'm following up on a long running thread. This is a major pain for my work as well. I'm curious, is this bug open because no one has been able to devote time to it, or because it doesn't seem to have a feasible solution?

Thanks!

@Subtletree
Copy link

Hey!

I think even if we created a proper PR for this fix it probably wouldn't be merged and released as it seems quill is mostly abandoned? #3521 #3359

The code above has fixed the bug in my environment but it seems like the nodes copied from word can vary in other environments. Can't know for sure but if someone put time into finding out why then I think a solution would be feasible.

daniel-eder added a commit to lakesare/memcode that referenced this issue Aug 3, 2022
When pasting lists from word the pastematcher accidentally generated non breaking spaces between <li> and the the first character in the line.
trimLeft() on line 15 fixes this.

Intentional spaces before the first character will also be trimmed (I see no way to differentiate), however this should be a super-rare occurrence when pasting from word, and can still be fixed in the text editor after pasting.

See slab/quill#1225 (comment)
@timotheedorand
Copy link

Thank you @Subtletree This works #1225 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests