Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not search for exhaustive SNIPPET. #697

Merged
merged 2 commits into from
May 16, 2022
Merged

Conversation

mgautierfr
Copy link
Collaborator

Working on kiwix/libkiwix#769 (which is about suggestion), I've found this important improvement (about search).

MSet::snippet is more complex than it seems.
Xapian does some kind of complex algorithm to find the best text subset to
select. It does this by calculating the score/ranking of each term in the
text. To do so, it has to evaluate the terms in the context of the whole
mset and so, load "lot" of data from the database.

The perfect is the enemy of the good.
By removing the SNIPPET_EXHAUSTIVE flag, xapian evaluate less and return
(far more) quicker.
(https://xapian.org/docs/apidoc/html/classXapian_1_1MSet.html#a4797ae2295f88e49a9f76e3b89c21d88aea6a34a9c66720a44d5969ed47ca8edb)
Generated snippet is different, but still valid.
Do not search for exhaustive SNIPPET

On my computer with a zim file on a external usb drive (for low IO) and with a clean fs cache, a search for home drop from 87s to 15s (!!)
Generated snippets are different but the quality doesn't seem really impacted :

Original snippet (slow) :

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="content-type" />
    <style type="text/css">
     [...]
    </style>
    <title>Search: home</title>
  <link type="root" href=""><link type="text/css" href="/skin/jquery-ui/jquery-ui.min.css?cacheid=e1de77b3" rel="Stylesheet" />
<link type="text/css" href="/skin/jquery-ui/jquery-ui.theme.min.css?cacheid=2a5841f9" rel="Stylesheet" />
<link type="text/css" href="/skin/taskbar.css?cacheid=49365e9c" rel="Stylesheet" />
<script type="text/javascript" src="/skin/jquery-ui/external/jquery/jquery.js?cacheid=1d85f0f3" defer></script>
<script type="text/javascript" src="/skin/jquery-ui/jquery-ui.min.js?cacheid=d927c2ff" defer></script>
<script type="text/javascript" src="/skin/taskbar.js?cacheid=5982280c" defer></script>
</head>
  <body bgcolor="white"><span class="kiwix">
  <span id="kiwixtoolbar" class="ui-widget-header">
    <div class="kiwix_centered">
      <div class="kiwix_searchform">
        <form class="kiwixsearch" method="GET" action="/search" id="kiwixsearchform">
          
          <label for="kiwixsearchbox">&#x1f50d;</label>
          <input autocomplete="off" class="ui-autocomplete-input" id="kiwixsearchbox" name="pattern" type="text" title="Search ''" aria-label="Search ''">
        </form>
      </div>
        <input type="checkbox" id="kiwix_button_show_toggle">
        <label for="kiwix_button_show_toggle"><img src="/skin/caret.png?cacheid=22b942b4" alt=""></label>
        <div class="kiwix_button_cont">
            <a id="kiwix_serve_taskbar_library_button" title="Go to welcome page" aria-label="Go to welcome page" href="/"><button>&#x1f3e0;</button></a>
          
        </div>
    </div>
  </span>
</span>

    <div class="header">
        Results
        <b>
          2-11
        </b> of <b>
          971,150
        </b> for <b>
          "home"
        </b>
      
    </div>

    <div class="results">
      <ul>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/Coming_Home">
              Coming Home
            </a>
              <cite>...Coming <b>Home</b> (Faye Wong album), 1992 Coming <b>Home</b> (EP), 1998 EP by Iron Savior Comin' <b>Home</b> (EP), 2014 EP by Jessie James Decker Comin' <b>Home</b> (Larry Coryell album), 1984 Coming <b>Home</b>, 2010 album by Boozoo Bajou Coming <b>Home</b>, 2009 album by Nightmares on Wax Coming <b>Home</b>, 2008 album by Danny Wood Coming <b>Home</b>, 1998 album by Joe Grushecky Coming <b>Home</b>, 1998 album by Yungchen Lhamo Songs "Coming <b>Home</b>" / "Coming <b>Home</b> (To Richmond)" (Alex Lloyd songs), 2003 / 2014 "Coming <b>Home</b>" (Busted song), 2016 "Coming <b>Home</b>"......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">799 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2007_in_home_video">
              2007 in home video
            </a>
              <cite>...Line <b>Home</b> Entertainment DVD Knocked Up Universal Studios <b>Home</b> Entertainment Unrated DVD and HD DVD Next (2007) Paramount <b>Home</b> Entertainment DVD The TV Set 20th Century Fox <b>Home</b> Entertainment DVD Walking Tall: Lone Justice Sony Pictures <b>Home</b> Entertainment Direct-to-video DVD Wayside: The Movie Paramount <b>Home</b> Entertainment DVD October 2 1408 Dimension <b>Home</b> Entertainment DVD and Blu-ray Bratz Babyz: The Movie Lionsgate <b>Home</b> Entertainment DVD re-release Civic Duty 20th Century Fox <b>Home</b> Entertainment......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,873 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/Home_Sweet_Home">
              Home Sweet Home
            </a>
              <cite>...<b>Home</b> (2007 video game), a 2007/2008 game for PC and WiiWare <b>Home</b> Sweet <b>Home</b> (2017 video game), a 2017 horror game <b>Home</b> Sweet <b>Home</b>, a historic house and museum in East Hampton, Long Island <b>Home</b> Sweet <b>Home</b>, a novel by Jeanne Betancourt See also "<b>Home</b> Sweet <b>Home</b>/Bittersweet Symphony", a 2005 medley by Limp Bizkit <b>Home</b> Sweet Homer (musical), a notorious Broadway flop "<b>Home</b> Sweet Homeless", an episode of the TV series The Care Bears "<b>Home</b> Sweet Homes", an episode of the TV series Barney &amp; Friends...</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">419 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2006_in_home_video">
              2006 in home video
            </a>
              <cite>...Tail of Two Kitties 20th Century Fox <b>Home</b> Entertainment DVD The Polar Express Warner <b>Home</b> Video HD DVD A Prairie <b>Home</b> Companion New Line <b>Home</b> Entertainment DVD Save the Last Dance 2 Paramount <b>Home</b> Entertainment Direct-to-video DVD Syriana Warner <b>Home</b> Video Blu-ray U2: Rattle and Hum Paramount <b>Home</b> Entertainment Blu-ray Waist Deep Universal Studios <b>Home</b> Entertainment DVD October 17 American Dreamz Universal Studios <b>Home</b> Entertainment DVD Behind Enemy Lines II: Axis of Evil 20th Century Fox <b>Home</b>......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,802 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/List_of_Chi&apos;s_Sweet_Home_chapters">
              List of Chi&apos;s Sweet Home chapters
            </a>
              <cite>...(猫、窺う。, Neko, Ukagau.) <b>home</b> made 30. "A Cat Reports" (猫、報告する。, Neko, Houkoku Suru.) <b>home</b> made 31. "A Cat Uses Her Head" (猫、頭を使う。, Neko, Atama o Tsukau.) <b>home</b> made 32. "A Cat Aggravates" (猫、困らせる。, Neko, Komaraseru.) <b>home</b> made 33. "A Cat Searches" (猫、探す。, Neko, Sagasu.) <b>home</b> made 34. "A Cat Stalks" (猫、追跡する。, Neko, Tsuiseki Suru.) <b>home</b> made 35. "A Cat Goes <b>Home</b>" (猫、お家に帰る。, Neko, OieniKaeru.) <b>home</b> made 36....</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">2,003 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2004_in_home_video">
              2004 in home video
            </a>
              <cite>...Pieces of April MGM <b>Home</b> Entertainment DVD and VHS release Spy Kids 3: Game Over Dimension <b>Home</b> Video DVD and VHS release March 2 Cold Creek Manor Touchstone <b>Home</b> Entertainment DVD and VHS release Duplex Miramax <b>Home</b> Entertainment DVD and VHS release Good Boy! MGM <b>Home</b> Entertainment DVD and VHS release Looney Tunes: Back in Action Warner <b>Home</b> Video DVD and VHS release Lucy Must Be Traded, Charlie Brown Paramount <b>Home</b> Entertainment DVD and VHS release Mad Dog and Glory Universal Studios <b>Home</b>......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,349 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2005_in_home_video">
              2005 in home video
            </a>
              <cite>...Godfather Part II Paramount <b>Home</b> Entertainment DVD The Godfather Part III Pooh's Heffalump Movie Walt Disney <b>Home</b> Entertainment DVD and VHS May 31 Boogeyman Sony Pictures <b>Home</b> Entertainment DVD and VHS Chronicle of the Raven Lions Gate <b>Home</b> Entertainment DVD and VHS Dad (1989) Universal Studios <b>Home</b> Entertainment DVD East of Eden Warner <b>Home</b> Video DVD Fascination (2004) MGM <b>Home</b> Entertainment DVD and VHS The Four Seasons (1981) Universal Studios <b>Home</b> Entertainment DVD Get a Clue Walt Disney <b>Home</b>......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,647 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/1998_in_home_video">
              1998 in home video
            </a>
              <cite>...TriStar <b>Home</b> Video VHS The Lost World Sterling <b>Home</b> Entertainment VHS Phantasm IV: Oblivion MGM <b>Home</b> Entertainment VHS October 20 Best of the Best 4: Without Warning Dimension <b>Home</b> Video VHS October 27 Cats Universal Studios <b>Home</b> Video VHS The Lion King II: Simba's Pride Walt Disney <b>Home</b> Video VHS November 3 The First 9½ Weeks Sterling <b>Home</b> Entertainment VHS Richie Rich's Christmas Wish Warner <b>Home</b> Video VHS November 10 Billboard Dad Warner <b>Home</b> Video VHS November 17 An All Dogs Christmas Carol......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">5,406 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2008_in_home_video">
              2008 in home video
            </a>
              <cite>...Lonely Hearts (2006) Sony Pictures <b>Home</b> Entertainment Blu-ray Pee-wee's Big Adventure Warner <b>Home</b> Video DVD Return to Sleepaway Camp Magnolia <b>Home</b> Entertainment DVD Shrek the Halls DreamWorks Animation <b>Home</b> Entertainment DVD Transsiberian First Look <b>Home</b> Entertainment DVD November 8 Kung Fu Panda DreamWorks Animation <b>Home</b> Entertainment DVD and Blu-ray November 11 The Clique Warner <b>Home</b> Video DVD Christmas Cottage Lionsgate <b>Home</b> Entertainment DVD Hellboy II: The Golden Army Universal Studios <b>Home</b>......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,688 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/1996_in_home_video">
              1996 in home video
            </a>
              <cite>...Columbia TriStar <b>Home</b> Video VHS release The Arrival Orion <b>Home</b> Video VHS release October 29 Toy Story Walt Disney <b>Home</b> Video VHS release Heaven's Prisoners Hollywood Pictures <b>Home</b> Video VHS release Eraser Warner <b>Home</b> Video VHS release A Thin Line Between Love and Hate New Line <b>Home</b> Video VHS release Stepmonster New Horizons <b>Home</b> Video VHS release Wild Bill MGM/UA <b>Home</b> Video VHS release November 5 Jane Eyre Miramax <b>Home</b> Entertainment VHS release Last Dance Touchstone <b>Home</b> Video VHS release Spy......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">5,747 words</div>
          </li>
      </ul>
    </div>

    <div class="footer">
        <ul>
            <li>
              <a href="/search?pattern=home&content=&start=0&pageLength=10">
                ◀
              </a>
            </li>
            <li>
              <a class="selected"
                 href="/search?pattern=home&content=&start=0&pageLength=10">
                1
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=10&pageLength=10">
                2
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=20&pageLength=10">
                3
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=30&pageLength=10">
                4
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=40&pageLength=10">
                5
              </a>
            </li>
            <li>
              <a href="/search?pattern=home&content=&start=971140&pageLength=10">
                ▶
              </a>
            </li>
        </ul>
    </div>
  </body>
</html>

New snippet (quick):

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="content-type" />
    <style type="text/css">
      [...]
    </style>
    <title>Search: home</title>
  <link type="root" href=""><link type="text/css" href="/skin/jquery-ui/jquery-ui.min.css?cacheid=e1de77b3" rel="Stylesheet" />
<link type="text/css" href="/skin/jquery-ui/jquery-ui.theme.min.css?cacheid=2a5841f9" rel="Stylesheet" />
<link type="text/css" href="/skin/taskbar.css?cacheid=49365e9c" rel="Stylesheet" />
<script type="text/javascript" src="/skin/jquery-ui/external/jquery/jquery.js?cacheid=1d85f0f3" defer></script>
<script type="text/javascript" src="/skin/jquery-ui/jquery-ui.min.js?cacheid=d927c2ff" defer></script>
<script type="text/javascript" src="/skin/taskbar.js?cacheid=5982280c" defer></script>
</head>
  <body bgcolor="white"><span class="kiwix">
  <span id="kiwixtoolbar" class="ui-widget-header">
    <div class="kiwix_centered">
      <div class="kiwix_searchform">
        <form class="kiwixsearch" method="GET" action="/search" id="kiwixsearchform">
          
          <label for="kiwixsearchbox">&#x1f50d;</label>
          <input autocomplete="off" class="ui-autocomplete-input" id="kiwixsearchbox" name="pattern" type="text" title="Search ''" aria-label="Search ''">
        </form>
      </div>
        <input type="checkbox" id="kiwix_button_show_toggle">
        <label for="kiwix_button_show_toggle"><img src="/skin/caret.png?cacheid=22b942b4" alt=""></label>
        <div class="kiwix_button_cont">
            <a id="kiwix_serve_taskbar_library_button" title="Go to welcome page" aria-label="Go to welcome page" href="/"><button>&#x1f3e0;</button></a>
          
        </div>
    </div>
  </span>
</span>

    <div class="header">
        Results
        <b>
          2-11
        </b> of <b>
          971,150
        </b> for <b>
          "home"
        </b>
      
    </div>

    <div class="results">
      <ul>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/Coming_Home">
              Coming Home
            </a>
              <cite>...Coming <b>Home</b> (Faye Wong album), 1992 Coming <b>Home</b> (EP), 1998 EP by Iron Savior Comin' <b>Home</b> (EP), 2014 EP by Jessie James Decker Comin' <b>Home</b> (Larry Coryell album), 1984 Coming <b>Home</b>, 2010 album by Boozoo Bajou Coming <b>Home</b>, 2009 album by Nightmares on Wax Coming <b>Home</b>, 2008 album by Danny Wood Coming <b>Home</b>, 1998 album by Joe Grushecky Coming <b>Home</b>, 1998 album by Yungchen Lhamo Songs "Coming <b>Home</b>" / "Coming <b>Home</b> (To Richmond)" (Alex Lloyd songs), 2003 / 2014 "Coming <b>Home</b>" (Busted song), 2016 "Coming <b>Home</b>"......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">799 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2007_in_home_video">
              2007 in home video
            </a>
              <cite>...DVD January 2 Beer League Echo Bridge <b>Home</b> Entertainment DVD The Covenant Sony Pictures <b>Home</b> Entertainment DVD Love's Abiding Joy 20th Century Fox <b>Home</b> Entertainment DVD Million Dollar Mystery Anchor Bay Entertainment DVD Snakes on a Plane New Line <b>Home</b> Entertainment DVD Sparkle Warner <b>Home</b> Video DVD Vidocq (2001) Lionsgate <b>Home</b> Entertainment DVD January 9 Bandidas 20th Century Fox <b>Home</b> Entertainment DVD Broken Bridges Paramount <b>Home</b> Entertainment DVD Conversations with Other Women Virgil Films......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,873 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/Home_Sweet_Home">
              Home Sweet Home
            </a>
              <cite>...<b>Home</b> (2007 video game), a 2007/2008 game for PC and WiiWare <b>Home</b> Sweet <b>Home</b> (2017 video game), a 2017 horror game <b>Home</b> Sweet <b>Home</b>, a historic house and museum in East Hampton, Long Island <b>Home</b> Sweet <b>Home</b>, a novel by Jeanne Betancourt See also "<b>Home</b> Sweet <b>Home</b>/Bittersweet Symphony", a 2005 medley by Limp Bizkit <b>Home</b> Sweet Homer (musical), a notorious Broadway flop "<b>Home</b> Sweet Homeless", an episode of the TV series The Care Bears "<b>Home</b> Sweet Homes", an episode of the TV series Barney &amp; Friends...</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">419 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2006_in_home_video">
              2006 in home video
            </a>
              <cite>...<b>home</b> video 2006 in <b>home</b> video is considered something of a watershed for <b>home</b> media technology, with VHS being phased out as Blu-ray fought to replace the presently dominant DVD format. 2006 marks the end of the VHS era with the release of A History of Violence, the last VHS release for a major Hollywood film. Major retailers are switching to DVD-only sales while tapes are being sent to discount stores.[1] This time marks the beginning of a major format war between Blu-ray and HD DVD which would......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,802 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/List_of_Chi&apos;s_Sweet_Home_chapters">
              List of Chi&apos;s Sweet Home chapters
            </a>
              <cite>...Rikai suru.) <b>home</b> made 8. "A Cat Remembers" (猫、思い出す。, Neko, Omoidasu.) <b>home</b> made 9. "A Cat Dreams" (猫、夢を見る。, Neko, Yume o Miru.) <b>home</b> made 10. "A Cat is Fired Up" (猫、興奮する。, Neko, Koufun suru.) <b>home</b> made 11. "A Cat Plays" (猫、遊ぶ。, Neko, Asobu.) <b>home</b> made 12. "A Cat is Lost... Again" (猫、再び迷子になる。, Neko, Futatabi Maigo ni Naru.) <b>home</b> made 13. "A Cat Fights!" (猫、けんかする。, Neko,Kenka Suru.) <b>home</b> made 14. "A Cat Goes......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">2,003 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2004_in_home_video">
              2004 in home video
            </a>
              <cite>2004 in <b>home</b> video The following events occurred in the year 2004 in <b>home</b> video. Years in <b>home</b> video: 2001 2002 2003 2004 2005 2006 2007 Centuries: 20th century · 21st century · 22nd century Decades: 1970s 1980s 1990s 2000s 2010s 2020s 2030s Years: 2001 2002 2003 2004 2005 2006 2007 Events November November 22 - Dixons Retail in the UK announces it will stop selling VHS tapes, after 26 years, because of the DVD boom.[1] Movie releases The following movies were released......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,349 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2005_in_home_video">
              2005 in home video
            </a>
              <cite>...<b>home</b> video The following events occurred in the year 2005 in <b>home</b> video. Years in <b>home</b> video: 2002 2003 2004 2005 2006 2007 2008 Centuries: 20th century · 21st century · 22nd century Decades: 1970s 1980s 1990s 2000s 2010s 2020s 2030s Years: 2002 2003 2004 2005 2006 2007 2008 Industry milestones June June - Target and Walmart in the United States and several other retailers announce plans to phase out the VHS format entirely by early 2006, in favor of the more popular DVD......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,647 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/1998_in_home_video">
              1998 in home video
            </a>
              <cite>...<b>home</b> video 1998 is nearing the end of the dominance of the VHS format with the DVD overtaking tape sales by the early 2000s.[1] The so-called format wars are almost over with Sony's Betamax format ending production at about this same time.[2] The VHS format does not die out quickly because of its recording function, so many homes were adding a DVD player rather than replacing their VCRs. 1998 is a boom time for the brick and mortar video rental industry.[3] Years in <b>home</b> video: 1995 1996 1997......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">5,406 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2008_in_home_video">
              2008 in home video
            </a>
              <cite>...in <b>home</b> video The following events occurred in the year 2008 in <b>home</b> video. Years in <b>home</b> video: 2005 2006 2007 2008 2009 2010 2011 Centuries: 20th century · 21st century · 22nd century Decades: 1970s 1980s 1990s 2000s 2010s 2020s 2030s Years: 2005 2006 2007 2008 2009 2010 2011 Industry milestones March March 18 – Toshiba announces the HD DVD format will be dropped.[1] October October 28 – The last standalone JVC VHS-only player is produced.[2] October 31 – The last......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,688 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/1996_in_home_video">
              1996 in home video
            </a>
              <cite>...TriStar <b>Home</b> Video VHS release January 16 The Indian in the Cupboard Columbia TriStar <b>Home</b> Video VHS release; part of the Columbia TriStar Family Collection Die Hard with a Vengeance 20th Century Fox <b>Home</b> Entertainment VHS release Nine Months VHS release Lord of Illusions MGM/UA <b>Home</b> Video VHS release January 23 Jade Paramount <b>Home</b> Video VHS release Waterworld MCA/Universal <b>Home</b> Video VHS release Mortal Kombat New Line <b>Home</b> Video VHS re-release Speechless MGM/UA <b>Home</b> Video VHS release Tom Thumb......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">5,747 words</div>
          </li>
      </ul>
    </div>

    <div class="footer">
        <ul>
            <li>
              <a href="/search?pattern=home&content=&start=0&pageLength=10">
                ◀
              </a>
            </li>
            <li>
              <a class="selected"
                 href="/search?pattern=home&content=&start=0&pageLength=10">
                1
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=10&pageLength=10">
                2
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=20&pageLength=10">
                3
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=30&pageLength=10">
                4
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=40&pageLength=10">
                5
              </a>
            </li>
            <li>
              <a href="/search?pattern=home&content=&start=971140&pageLength=10">
                ▶
              </a>
            </li>
        </ul>
    </div>
  </body>
</html>

`MSet::snippet` is more complex than it seems.
Xapian does some kind of complex algorithm to find the best text subset to
select. It does this by calculating the score/ranking of each term in the
text. To do so, it has to evaluate the terms in the context of the whole
mset and so, load "lot" of data from the database.

The perfect is the enemy of the good.
By removing the SNIPPET_EXHAUSTIVE flag, xapian evaluate less and return
(far more) quicker.
(https://xapian.org/docs/apidoc/html/classXapian_1_1MSet.html#a4797ae2295f88e49a9f76e3b89c21d88aea6a34a9c66720a44d5969ed47ca8edb)
Generated snippet is different, but still valid.
Do not search for exhaustive SNIPPET
@codecov
Copy link

codecov bot commented May 16, 2022

Codecov Report

Merging #697 (b994697) into master (189a4d0) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #697   +/-   ##
=======================================
  Coverage   84.61%   84.61%           
=======================================
  Files          98       98           
  Lines        4308     4310    +2     
  Branches     1873     1869    -4     
=======================================
+ Hits         3645     3647    +2     
  Misses        662      662           
  Partials        1        1           
Impacted Files Coverage Δ
src/search_iterator.cpp 86.95% <100.00%> (+0.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 189a4d0...b994697. Read the comment docs.

@kelson42
Copy link
Contributor

kelson42 commented May 16, 2022

Great! This deserves a dedicated (patch) release IMO.

@kelson42 kelson42 added this to the 7.2.2 milestone May 16, 2022
Copy link
Contributor

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

It seems it has even more importance in the snippet generation speed.
SNIPPET_BACKGROUND_MODEL ask xapian to also compute score for non-query
terms. By removing it, we compute score only for query terms and it is
far better.
@mgautierfr
Copy link
Collaborator Author

mgautierfr commented May 16, 2022

It seems that by removing SNIPPET_BACKGROUND_MODEL this is even faster.
I now generated the search page in less than 3 secondes (yes, you read well).
The snippet is still pretty good (if not better):

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="content-type" />
    <style type="text/css">
     [...]
    </style>
    <title>Search: home</title>
  <link type="root" href=""><link type="text/css" href="/skin/jquery-ui/jquery-ui.min.css?cacheid=e1de77b3" rel="Stylesheet" />
<link type="text/css" href="/skin/jquery-ui/jquery-ui.theme.min.css?cacheid=2a5841f9" rel="Stylesheet" />
<link type="text/css" href="/skin/taskbar.css?cacheid=49365e9c" rel="Stylesheet" />
<script type="text/javascript" src="/skin/jquery-ui/external/jquery/jquery.js?cacheid=1d85f0f3" defer></script>
<script type="text/javascript" src="/skin/jquery-ui/jquery-ui.min.js?cacheid=d927c2ff" defer></script>
<script type="text/javascript" src="/skin/taskbar.js?cacheid=5982280c" defer></script>
</head>
  <body bgcolor="white"><span class="kiwix">
  <span id="kiwixtoolbar" class="ui-widget-header">
    <div class="kiwix_centered">
      <div class="kiwix_searchform">
        <form class="kiwixsearch" method="GET" action="/search" id="kiwixsearchform">
          
          <label for="kiwixsearchbox">&#x1f50d;</label>
          <input autocomplete="off" class="ui-autocomplete-input" id="kiwixsearchbox" name="pattern" type="text" title="Search ''" aria-label="Search ''">
        </form>
      </div>
        <input type="checkbox" id="kiwix_button_show_toggle">
        <label for="kiwix_button_show_toggle"><img src="/skin/caret.png?cacheid=22b942b4" alt=""></label>
        <div class="kiwix_button_cont">
            <a id="kiwix_serve_taskbar_library_button" title="Go to welcome page" aria-label="Go to welcome page" href="/"><button>&#x1f3e0;</button></a>
          
        </div>
    </div>
  </span>
</span>

    <div class="header">
        Results
        <b>
          2-11
        </b> of <b>
          971,150
        </b> for <b>
          "home"
        </b>
      
    </div>

    <div class="results">
      <ul>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/Coming_Home">
              Coming Home
            </a>
              <cite>...<b>Home</b>" (Leon Bridges song), 2015 "Coming <b>Home</b>" (Sasha song), 2006 "Coming <b>Home</b>" (Sjonni's Friends song), 2011 "Coming <b>Home</b>" (The Soldiers song), 2009 "Coming <b>Home</b>" (Sigma and Rita Ora song), 2015 "Coming <b>Home</b>" (Sheppard song), 2017 "Comin' <b>Home</b>" (City and Colour song), 2006 "Comin' <b>Home</b>" (Hum song), 1998 "Comin' <b>Home</b>" (The Radiators song), 1979 "Coming <b>Home</b> (Jeanny Part 2)" by Falco, 1986 "Major Tom (Coming <b>Home</b>)", by Peter Schilling "Coming <b>Home</b>", by Alex Band from Alex Band EP "Coming <b>Home</b>", by......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">799 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2007_in_home_video">
              2007 in home video
            </a>
              <cite>...<b>Home</b> Entertainment DVD Million Dollar Mystery Anchor Bay Entertainment DVD Snakes on a Plane New Line <b>Home</b> Entertainment DVD Sparkle Warner <b>Home</b> Video DVD Vidocq (2001) Lionsgate <b>Home</b> Entertainment DVD January 9 Bandidas 20th Century Fox <b>Home</b> Entertainment DVD Broken Bridges Paramount <b>Home</b> Entertainment DVD Conversations with Other Women Virgil Films &amp; Entertainment DVD Crank Lionsgate <b>Home</b> Entertainment DVD Idiocracy 20th Century Fox <b>Home</b> Entertainment DVD The Illusionist (2006) DVD Kid Monk......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,873 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/Home_Sweet_Home">
              Home Sweet Home
            </a>
              <cite><b>Home</b> Sweet <b>Home</b> Look up <b>home</b> sweet <b>home</b> in Wiktionary, the free dictionary. <b>Home</b> Sweet <b>Home</b> may refer Film <b>Home</b>, Sweet <b>Home</b> (1914 film), a film about the life of John Howard Payne <b>Home</b> Sweet <b>Home</b> (1917 film), a British silent film <b>Home</b> Sweet <b>Home</b> (1926 film), a silent film drama <b>Home</b>, Sweet <b>Home</b> (1933 film), a British film starring Richard Cooper <b>Home</b> Sweet <b>Home</b> (1945 film), a British comedy film starring Frank Randle <b>Home</b> Sweet <b>Home</b> (1970 film), a Taiwanese film awarded a Golden Horse Award for......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">419 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2006_in_home_video">
              2006 in home video
            </a>
              <cite>...<b>home</b> video 2006 in <b>home</b> video is considered something of a watershed for <b>home</b> media technology, with VHS being phased out as Blu-ray fought to replace the presently dominant DVD format. 2006 marks the end of the VHS era with the release of A History of Violence, the last VHS release for a major Hollywood film. Major retailers are switching to DVD-only sales while tapes are being sent to discount stores.[1] This time marks the beginning of a major format war between Blu-ray and HD DVD which would......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,802 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/List_of_Chi&apos;s_Sweet_Home_chapters">
              List of Chi&apos;s Sweet Home chapters
            </a>
              <cite>...<b>home</b> made 8. "A Cat Remembers" (猫、思い出す。, Neko, Omoidasu.) <b>home</b> made 9. "A Cat Dreams" (猫、夢を見る。, Neko, Yume o Miru.) <b>home</b> made 10. "A Cat is Fired Up" (猫、興奮する。, Neko, Koufun suru.) <b>home</b> made 11. "A Cat Plays" (猫、遊ぶ。, Neko, Asobu.) <b>home</b> made 12. "A Cat is Lost... Again" (猫、再び迷子になる。, Neko, Futatabi Maigo ni Naru.) <b>home</b> made 13. "A Cat Fights!" (猫、けんかする。, Neko,Kenka Suru.) <b>home</b> made 14. "A Cat Goes to the Vet......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">2,003 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2004_in_home_video">
              2004 in home video
            </a>
              <cite>...<b>home</b> video The following events occurred in the year 2004 in <b>home</b> video. Years in <b>home</b> video: 2001 2002 2003 2004 2005 2006 2007 Centuries: 20th century · 21st century · 22nd century Decades: 1970s 1980s 1990s 2000s 2010s 2020s 2030s Years: 2001 2002 2003 2004 2005 2006 2007 Events November November 22 - Dixons Retail in the UK announces it will stop selling VHS tapes, after 26 years, because of the DVD boom.[1] Movie releases The following movies were released on video......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,349 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2005_in_home_video">
              2005 in home video
            </a>
              <cite>...<b>home</b> video The following events occurred in the year 2005 in <b>home</b> video. Years in <b>home</b> video: 2002 2003 2004 2005 2006 2007 2008 Centuries: 20th century · 21st century · 22nd century Decades: 1970s 1980s 1990s 2000s 2010s 2020s 2030s Years: 2002 2003 2004 2005 2006 2007 2008 Industry milestones June June - Target and Walmart in the United States and several other retailers announce plans to phase out the VHS format entirely by early 2006, in favor of the more popular DVD......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,647 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/1998_in_home_video">
              1998 in home video
            </a>
              <cite>...<b>home</b> video 1998 is nearing the end of the dominance of the VHS format with the DVD overtaking tape sales by the early 2000s.[1] The so-called format wars are almost over with Sony's Betamax format ending production at about this same time.[2] The VHS format does not die out quickly because of its recording function, so many homes were adding a DVD player rather than replacing their VCRs. 1998 is a boom time for the brick and mortar video rental industry.[3] Years in <b>home</b> video: 1995 1996 1997......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">5,406 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/2008_in_home_video">
              2008 in home video
            </a>
              <cite>...in <b>home</b> video The following events occurred in the year 2008 in <b>home</b> video. Years in <b>home</b> video: 2005 2006 2007 2008 2009 2010 2011 Centuries: 20th century · 21st century · 22nd century Decades: 1970s 1980s 1990s 2000s 2010s 2020s 2030s Years: 2005 2006 2007 2008 2009 2010 2011 Industry milestones March March 18 – Toshiba announces the HD DVD format will be dropped.[1] October October 28 – The last standalone JVC VHS-only player is produced.[2] October 31 – The last......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">11,688 words</div>
          </li>
          <li>
            <a href="/wikipedia_en_all_maxi_2020-08/A/1996_in_home_video">
              1996 in home video
            </a>
              <cite>...<b>Home</b> Video VHS release January 16 The Indian in the Cupboard Columbia TriStar <b>Home</b> Video VHS release; part of the Columbia TriStar Family Collection Die Hard with a Vengeance 20th Century Fox <b>Home</b> Entertainment VHS release Nine Months VHS release Lord of Illusions MGM/UA <b>Home</b> Video VHS release January 23 Jade Paramount <b>Home</b> Video VHS release Waterworld MCA/Universal <b>Home</b> Video VHS release Mortal Kombat New Line <b>Home</b> Video VHS re-release Speechless MGM/UA <b>Home</b> Video VHS release Tom Thumb VHS......</cite>
              <div class="book-title">from Wikipedia</div>
              <div class="informations">5,747 words</div>
          </li>
      </ul>
    </div>

    <div class="footer">
        <ul>
            <li>
              <a href="/search?pattern=home&content=&start=0&pageLength=10">
                ◀
              </a>
            </li>
            <li>
              <a class="selected"
                 href="/search?pattern=home&content=&start=0&pageLength=10">
                1
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=10&pageLength=10">
                2
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=20&pageLength=10">
                3
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=30&pageLength=10">
                4
              </a>
            </li>
            <li>
              <a 
                 href="/search?pattern=home&content=&start=40&pageLength=10">
                5
              </a>
            </li>
            <li>
              <a href="/search?pattern=home&content=&start=971140&pageLength=10">
                ▶
              </a>
            </li>
        </ul>
    </div>
  </body>
</html>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants