Releases: lexborisov/myhtml
Releases · lexborisov/myhtml
v4.0.5
- Fixed parsing problem for PRE element with CDATA in thread and single mode. #156
- Fixed the problem of parsing chunks when there was a script tag. #154
- Fixed parsing entity. In very rare cases there were wrong parsing. 541219b
- Fixed segfault if doctype hasn't attribute. #151
- Append link to Perl 5 wrapper module.
- Minor bug fixes
Special thanks to Kirill Zhumarin for PRs.
v4.0.4
v4.0.2
- Grammar: change function name _pasition => _position
- Fixed infinite loop if html file is to big. Queue round not work properly - fixed. #117
- Append new function
myhtml_node_is_void_element
for check to see if we are dealing with a void element. #119 - Potential loss of the pointer on systems other than x86, x86_64 (Misaligned Integer Pointer)
v4.0.1
- Fix for creating a spinlock without support siplock #103
- Added two functions for detect encoding with returning found position
myencoding_prescan_stream_to_determine_encoding_with_found
andmyencoding_extracting_character_encoding_from_charset_with_found
#107 - Added automated package build and publicate on PackageCloud.io (https://packagecloud.io/modest/myhtml)
- Minor bug fixes
Special thanks for Alexander Fedyashov for help with automated package build.
v4.0.0
- API breaking changes
- MyHTML split to MyCORE, MyHTML, MyENCODING. MyCORE is a base module which include shared functions for all others modules. Each of the modules can build without other modules if he not dependent at it. It is good for those need only URL parse and not need other modules.
- Removed all io print functions to file:
myhtml_tree_print_by_node
,myhtml_tree_print_node_children
,myhtml_tree_print_node
; Use serializations instead of their - If you use encoding enum, like
MyHTML_ENCODING_UTF8
, now itMyENCODING_UTF_8
, i.eMyHTML_ENCODING_* => MyENCODING_*
- Functions migrated to MyCORE from MyHTML:
myhtml_incoming_buffer_*
=>mycore_incoming_buffer_*
,myhtml_string*
=>mycore_string*
,myhtml_utils*
=>mycore_utils*
- Fully refactoring build system with GNU Make (Makefile), now it expects generally accepted parameters and rules, like
install
,clean
,library
and more - Tested create a DLL library for Windows OS
- Support create ports for different OS or for simple change work with memory, io, threads (if build with threads, default)
- Support add self modules for build library
- Now all return statuses, like a
myhtml_status_t
,mycss_status_t
changed to globalmystatus_t
(unsigned int) - Added forgot '\0' if text node ends with '\r' #91
- Remove CMakeLists.txt
- Added PKG-CONFIG *.pc after make command
v3.0.1
New Release v3.0.1
Fixed broken mapping for convert encoding functions in release 3.0.0
Release 3.0.0 removed
- API breaking changes!!! See api_breaking_changes.md file
- Sync with Specification (https://html.spec.whatwg.org/multipage/)
- Fix problem with close token position in title tag (the inner essence)
- Fix problem with detect SHIFT_JIS encoding
- Added function
myhtml_encoding_prescan_stream_to_determine_encoding
to prescan a byte stream to determine its encoding. In other words, detect encoding inmeta
tag before start HTML parsing. See exapmle - Added function
myhtml_encoding_name_by_id
for get encoding name by id - Added function
myhtml_encoding_extracting_character_encoding_from_charset
- Added
utils/mhash.*
for create a hash table - Added function
myhtml_node_tree
for get current Tree from a node - Сonsumes less memory when initializing, 3MB => 1MB with no negative impact on performance. In the future, the memory will be consumed even less.
- Now
MyHTML_INSTALL_HEADER
in cmake options setON
by default - Fixed broken mapping for convert encoding functions after release 3.0.0
Thanks!
v2.0.1
v2.0.0
- API Breaking Changes: Remove all functions associated with tag index: myhtml_tree_get_tag_index, myhtml_tag_index_*
- Changes for work with threads
- Removed example
replacing_node_attributes_low_level.c
. Example is not working correctly. Let the future - Fix for
myhtml_string_destroy
function. Sometimes the resources are not free. - Fix problem with serialization in UTF-8 (0xC2 0xA0)
- Added AVL-Tree for utils
v1.0.4
- Added possibility to set specify user data value for tree node (#67)
- Fixed gross errors in serialization (b27f9f7)
- Added
myhtml_incoming_buffer_split
function for split Incoming Buffer Node - Added example for replace node attributes before begin token process
replacing_node_attributes_low_level
- Changes for function name
myhtml_incomming_*
=>myhtml_incoming_*
(my bad english) - Minor bug fixes
v1.0.3
- [https://github.com/lexborisov/myhtml/commit/873d0fa2cbe8ec4a3b8a50b649506e791f0907c7] Fixed attributes processing bug. Not cleared temp values after processing attribute key. What's in rare cases lead to segfault
- [https://github.com//issues/63] Fixes for tokenizer state problem with parse without build tree
- [https://github.com//issues/66#issuecomment-247941324] Fixed problem with append nodes to "tag index" in formatting reconstruction algorithm
- [https://github.com//issues/62] Fixed segfault if build without threads and used parse flag without process token
- Changed rules for parse flag MyHTML_TREE_PARSE_FLAGS_SKIP_WHITESPACE_TOKEN. Now we skip ws tokens, but not for RCDATA, RAWTEXT, CDATA and PLAINTEXT
- Added tree serialization by specification (see example)
- Tested by 1 billion HTML pages (by commoncrawl.org)