Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Whitespace diffs. Breaking whitespace diffs into standalone commit. * Implement "in (...)" as hash table lookups. Currently, a filtering expression with "x in (a, b c, ...)" is transformed into "x=a or x=b or x=c or ...". This can be slow for very long sets of a, b, c, especially when x has to be extracted from the event over and over. Replace this with a set membership test using unordered_set for PT_CHARBUF types. In filter.cpp, for CO_IN operators check the type of the operand. If it's PT_CHARBUF, loop over the items in the (a, b, c) set and call add_filter_value() for each. Otherwise, the current behavior is kept. sinsp_filter_check::add_filter_value now saves pointers to the filter values in the unordered_set m_val_storages_members, containing pairs of (pointer, length). Note that these are only pointers--the actual values are still held in m_val_storages. In sinsp_filter_check::flt_compare, for CO_IN operators simply check the unordered_set and if a match is found return true. This used to be dead code given how all CO_INs were replaced with a sequence of x=a or x=b or ..., but is used again. Custom functors g_hash_membuf/g_equal_to_membuf hash and compare the pointer-to-datas. When compiling with gcc, this simply uses gnu's built-in hash function for a pointer and length, which is quite fast. Otherwise, a standalone function is used. * Take advantage of string length. Take advantage of string length when doing set membership tests: - When comparing strings in the hash equality function, only bother doing the buffer comparison when the string lengths match. - It can be inefficient to hash a very long string when all members of the set are short strings. To make this case faster, keep track of the minumum and maximum string length across the set members and only bother doing the set comparison of the lengths overlap. To effectively use this, the length needs to be filled in when sinsp_filter_check::flt_compare() is called, so do a pass over the existing filterchecks and actually return the actual string length when it's easily known. In flt_compare(), if the type is a string and the provided length is 0, do a strlen() to find it. This should be very rare now that the length is properly passed back.
- Loading branch information