forked from aquasync/ruby-ole
-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
116 lines (84 loc) · 4.36 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
= Introduction
The ruby-ole library provides a variety of functions primarily for
working with OLE2 structured storage files, such as those produced by
Microsoft Office - eg *.doc, *.msg etc.
= Example Usage
Here are some examples of how to use the library functionality,
categorised roughly by purpose.
1. Reading and writing files within an OLE container
The recommended way to manipulate the contents is via the
"file_system" API, whereby you use Ole::Storage instance methods
similar to the regular File and Dir class methods.
ole = Ole::Storage.open('oleWithDirs.ole', 'rb+')
p ole.dir.entries('.') # => [".", "..", "dir1", "dir2", "file1"]
p ole.file.read('file1')[0, 25] # => "this is the entry 'file1'"
ole.dir.mkdir('newdir')
2. Accessing OLE meta data
Some convenience functions are provided for (currently read only)
access to OLE property sets and other sources of meta data.
ole = Ole::Storage.open('test_word_95.doc')
p ole.meta_data.file_format # => "MSWordDoc"
p ole.meta_data.mime_type # => "application/msword"
p ole.meta_data.doc_author.split.first # => "Charles"
3. Raw access to underlying OLE internals
This is probably of little interest to most developers using the
library, but for some use cases you may need to drop down to the
lower level API on which the "file_system" API is constructed,
which exposes more of the format details.
<tt>Ole::Storage</tt> files can have multiple files with the same name,
or with a slash in the name, and other things that are probably
strictly invalid. This API is the only way to access those files.
You can access the header object directly:
p ole.header.num_sbat # => 1
p ole.header.magic.unpack('H*') # => ["d0cf11e0a1b11ae1"]
You can directly access the array of all Dirent objects,
including the root:
p ole.dirents.length # => 5
puts ole.root.to_tree
# =>
- #<Dirent:"Root Entry">
|- #<Dirent:"\001Ole" size=20 data="\001\000\000\002\000...">
|- #<Dirent:"\001CompObj" size=98 data="\001\000\376\377\003...">
|- #<Dirent:"WordDocument" size=2574 data="\334\245e\000-...">
\- #<Dirent:"\005SummaryInformation" size=54788 data="\376\377\000\000\001...">
You can access (through RangesIO methods, or by using the
relevant Dirent and AllocationTable methods) information like where within
the container a stream is located (these are offset/length pairs):
p ole.root["\001CompObj"].open { |io| io.ranges } # => [[0, 64], [64, 34]]
See the documentation for each class for more details.
= Thanks
* The code contained in this project was initially based on chicago's libole
(source available at http://prdownloads.sf.net/chicago/ole.tgz).
* It was later augmented with some corrections by inspecting pole, and (purely
for header definitions) gsf.
* The property set parsing code came from the apache java project POIFS.
* The excellent idea for using a pseudo file system style interface by providing
#file and #dir methods which mimic File and Dir, was borrowed (along with almost
unchanged tests!) from Thomas Sondergaard's rubyzip.
= TODO
== 1.2.12
* internal api cleanup
* add buffering to rangesio so that performance for small reads and writes
isn't so awful. maybe try and remove the bottlenecks of unbuffered first
with more profiling, then implement the buffering on top of that.
* fix mode strings - like truncate when using 'w+', supporting append
'a+' modes etc. done?
* make ranges io obey readable vs writeable modes.
* more RangesIO completion. ie, doesn't support #<< at the moment.
* maybe some oletool doc.
* make sure `rake test' runs tests both with $KCODE='UTF8', and without,
and maybe ensure i don't regress on 1.9 and jruby either now that they're
fixed.
== 1.3.1
* case insensitive open mode would be nice
* fix property sets a bit more. see TODO in Ole::Storage::MetaData
* ability to zero out padding and unused blocks
* better tests for mbat support.
* further doc cleanup
* add in place testing for jruby and ruby1.9
== Longer term
* more benchmarking, profiling, and speed fixes. was thinking vs other
ruby filesystems (eg, vs File/Dir itself, and vs rubyzip), and vs other
ole implementations (maybe perl's, and poifs) just to check its in the
ballpark, with no remaining silly bottlenecks.
* supposedly vba does something weird to ole files. test that.