Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to search by type signature on the docs #12866

Closed
reem opened this issue Mar 13, 2014 · 11 comments
Closed

Add ability to search by type signature on the docs #12866

reem opened this issue Mar 13, 2014 · 11 comments
Labels
E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@reem
Copy link
Contributor

reem commented Mar 13, 2014

One of hoogle's most useful features is the ability to search by type signatures. It makes finding functions whose behavior you know but whose names you don't know much easier.

We should add this ability to the search feature in the rust docs.

@huonw
Copy link
Member

huonw commented Mar 13, 2014

I really like Hoogle too. (This would be most useful as a global search, currently rustdoc's search is local to a single crate.)

@thehydroimpulse
Copy link
Contributor

/cc me

@mihneadb
Copy link
Contributor

mihneadb commented Dec 8, 2014

I'm starting to research on what is needed to get this working. Any tips are appreciated :). I'll post my questions here as soon as I have them.

@mihneadb
Copy link
Contributor

Here's what I have after some code exploration:

There is already a search index that is being built by rustdoc at generation time. This contains a list of items and paths (an item can be a function, a struct, a field etc.). There is quite some search logic written to query this index.

Since we have access to the AST while building the index, I'm thinking that we should also build a mapping from function type (probably represented as string) to [item id] (where such an id is probaby an index in the items list in the existing index. This mapping would be stored in the types field (sibling of items and paths).

Then, when the search query is performed, if the query contains -> we look in the types dict. Otherwise, the old flow is preserved.

What do you guys think?

@ftxqxd
Copy link
Contributor

ftxqxd commented Dec 15, 2014

I think that that’s generally a good way of doing it, although I’m not sure that a string-based signature mapping would be a good idea; I think there are a few subtle cases that need to be handled here that might not interact so well with a text-based system. Certainly, not all of them need to be done straight away for a first implementation, but they’d still be nice in the long term:

  • A function like fn foo<T>(x: T) -> T should show up when someone searches for (for example) fn(int) -> int, but also for the more general fn<T>(T) -> T.
  • There needs to be a certain amount of non-textual fuzzy matching: for example, lifetimes and trait bounds should be able to be omitted, references and other pointers should be elidable, etc.. However, more closely matching things should be favoured over others: if a user searched for fn(T) -> T, results like Float::sqrt (fn(f64) -> f64) should show up before results like Clone::clone (fn<T>(&T) -> T).
  • Type parameter names should be interchangeable. fn<T>(T) -> T is the same as fn<U>(U) -> U.
  • Traits’ methods would ideally be searchable as generic functions. So Iterator::next would be fn<Self, A>(&mut Self) -> Option<A>. Inherent methods are the same: fn map<T, U>(Option<T>, |T| -> U) -> Option<U> It would also be nice if methods could also be searched using self, implicitly meaning any type (so fn<T>(self) -> T could satisfy Option::unwrap, for example).
  • Implicit single-letter type parameters would also be nice, so that fn(self) -> T could find Option::unwrap.
  • Paths should be able to be qualified partially or fully: so one could write std::option::Option, option::Option or simply Option.

Obviously, all of this shouldn’t have to be done for a first implementation; in fact, none of it really has to be done. But I think it’d be best to design a scheme with this in mind such that adding these features in the future will not require a complete restructuring of the system.

This is quite a tricky task, though, because half of the searching functionality is done on the client, in Javascript. I think that verbose, fully-expanded-with-lots-of-details signatures should be output to the index, and then the Javascript code would have to parse the query as a function signature, and if successful, search the index for things that roughly match the inputted function’s signature. I think doing this textually might not be such a good idea, but I’m not sure. Later, the function signature from the search query could be varied in a number of ways to take into account some of the things I listed above (and maybe others).

I think performance is a bit of a worry, though—loading the existing index already takes a long time. On my system, clearing the cache and reloading the standard library’s docs takes three seconds to download the index alone, which is ~1 MB in size.

@huonw
Copy link
Member

huonw commented Dec 15, 2014

(To add to the list, argument order shouldn't matter, so fn(u8, uint) appears in a search for fn(uint, u8).)

@mihneadb
Copy link
Contributor

Thanks for all the input! Great points indeed.

I'd rather get the format right (or an approximation of it) and basic search (say for simple types, no generics / lifetimes) such that other contributors may have a starting point.

How's this for the index representation?

Given: fn foo<'a, T: Show>(x: 'a T, y: 'a T) -> 'a T

JSON:

{
  "annotations": [
    {
      "type": "lifetime",
      "name": "a"
    },
    {
      "type": "type",
      "name": "T",
      "traits": [
        "Show"
      ]
    }
  ],
  "arguments": [
    {
      "name": "x",
      "lifetime": "a",
      "type": "T"
    },
    {
      "name": "x",
      "lifetime": "a",
      "type": "T"
    }
  ],
  "return": {
    "type": "T",
    "lifetime": "a"
  }
}

@mihneadb
Copy link
Contributor

Also, I'm considering gzipping everything and extracting it in JS since the computation cost should be small but it would help with size.

@dbp
Copy link
Contributor

dbp commented Dec 30, 2014

Neal Mitchell has written some things about the various iterations of Hoogle (and lessons learned), which would probably be useful as reference.

In writing the (now outdated to the point of uselessness) simple version of this at https://github.com/dbp/rustle , one thing that turned out to be important to get useful ordering of results was to progressively generalize the query and order the results based on that.

So uint -> uint is searched directly, then T -> T and then T -> U (though since the latter can't be fulfilled by a useful function, perhaps don't generalize that far).

Also, instead of searching for T, come up with some representation of type variables that is easier to match. What rustle did, which I was never all that happy with (but is better than nothing), was to treat them as ordinals, ranked by frequency (so a T -> U -> T function would be 1 -> 2 -> 1 essentially, and a U -> T -> U function would turn into the same thing), with ties constructing multiple entries.

@ajtulloch
Copy link

FWIW, I took a look yesterday at dumping a stylized type signature from RustDoc that Hoogle can then read in, which allows us to reuse all the fancy type search algorithms Hoogle uses (retrieval, selective generalization, canonicalization, scoring, etc). It seems to work OK as a demo with even just a simple implementation (~200 lines in librustdoc, ~200 lines in Hoogle).

example

https://github.com/ajtulloch/roogle has some more documentation on the approach, usage instructions, and examples (along with the Hoogle changes), and https://github.com/ajtulloch/rust has the librustdoc changes.

The main missing features (relative to Hoogle for Haskell) are:

  • links to doc.rust-lang.org for search results
  • more complete rustdoc integration, so all function/methods/traits/aliases are exported to Hoogle
  • more precise translation of the Rust type system
    • Handling of mut T vs T
    • &T vs T
    • T: Eq + Num vs unbounded generics, etc.

@steveklabnik
Copy link
Member

I'm pulling a massive triage effort to get us ready for 1.0. As part of this, I'm moving stuff that's wishlist-like to the RFCs repo, as that's where major new things should get discussed/prioritized.

This issue has been moved to the RFCs repo: rust-lang/rfcs#658

bors added a commit that referenced this issue Mar 14, 2015
This adds search by type (for functions/methods) support to Rustdoc. Target issue is at rust-lang/rfcs#658.

I've described my approach here: rust-lang/rfcs#658 (comment). I'll copy the text in here as well:

---

Hi, it took me longer than I wished, but I have implemented this in a not-too-complex way that I think can be extended to support more complex features (like the ones mentioned [here](#12866 (comment))).

The idea is to generate a JSON representation of the types of methods/functions in the existing index, and then make the JS understand when it should look by type (and not by name).

I tried to come up with a JSON representation that can be extended to support generics, bounds, ref/mut annotations and so on. Here are a few samples:

Function:

```rust
fn to_uppercase(c: char) -> char
```

```json
{
    "inputs": [
        {"name": "char"}
    ],
    "output": {
        "name": "char",
    }
}
```

Method (implemented or defined in trait):

```rust
// in struct Vec
// self is considered an argument as well
fn capacity(&self) -> usize
```

```json
{
    "inputs": [
        {"name": "vec"}
    ],
    "output": {
        "name": "usize"
    }
}
```

This simple format can be extended by adding more fields, like `generic: bool`, a `bounds` mapping and so on.

I have a working implementation in master...mihneadb:rustdoc-search-by-type. You can check out a live demo [here](http://data.mihneadb.net/doc/std/index.html?search=charext%20-%3E%20char).

![screenshot from 2015-02-28 00 54 00](https://cloud.githubusercontent.com/assets/643127/6422722/7e5374ee-bee4-11e4-99a6-9aac3c9d5068.png)


The feature list is not that long:
- search by types (you *can* use generics as well, as long as you use the exact name - e.g. [`vec,t -> `](http://data.mihneadb.net/doc/std/index.html?search=vec%2C%20t%20-%3E))
- order of arguments does not matter
- `self` is took into account as well (e.g. search for `vec -> usize`)
- does not use "complex" annotations (e.g. you don't search for `&char -> char` but for `char -> char`)

My goal is to get a working, minimal "base" merged so that others can build upon it. How should I proceed? Do I open a PR (badly in need of code review since this is my first non "hello world"-ish rust code)?

---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants