You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, filecheck.py leaves rtf files untouched, and only changes their extension to .txt (in File.text()). An rtf, when opened as plaintext, will be difficult to read due to the various pieces of formatting code mixed in the text. Ideally, we should be able to extract the content from an rtf file during processing.
Unfortunately, there aren't any great existing solutions for this other than OpenOffice, which gives us a dependency we'd prefer not to have. If you don't need 100% compatibility, it's fairly reasonable to write an rtf parser: here is a library that implements most of the behavior we want, which could be a good starting point. Unfortunately, that code is Python 2 only, and perhaps a little verbose.
The text was updated successfully, but these errors were encountered:
Currently, filecheck.py leaves rtf files untouched, and only changes their extension to .txt (in
File.text()
). An rtf, when opened as plaintext, will be difficult to read due to the various pieces of formatting code mixed in the text. Ideally, we should be able to extract the content from an rtf file during processing.Unfortunately, there aren't any great existing solutions for this other than OpenOffice, which gives us a dependency we'd prefer not to have. If you don't need 100% compatibility, it's fairly reasonable to write an rtf parser: here is a library that implements most of the behavior we want, which could be a good starting point. Unfortunately, that code is Python 2 only, and perhaps a little verbose.
The text was updated successfully, but these errors were encountered: