Quantcast
Viewing all articles
Browse latest Browse all 49209

Improving the Search API

As some of you might know, in last year's GSoC I created the Search API module, which has already gained some fame since then. It's a highly flexible search solution for Drupal 7, already coming with support for Views and facetted searches out-of-the-box, amongst other things.
Still, like all software, it isn't perfect and there are still a number of known shortcomings, as well as other potential for improvement. So for this Google Summer of Code, I propose to fix some of these shortcomings, and add some additional features.
The tasks I'd have in mind are the following (in order of importance):

The last two are of course the buffers for any remaining time at the end — even when I'm done with the others, there will always be numerous ways to improve tests and documentation.

Detailled tasks

Provide ways to index data other than entities

This is one of the most long-standing shortcomings of the module that aren't fixed yet, so it's really time to tackle this. Right now, only things defined as Drupal entities (nodes, comments, users, taxonomy terms, etc., and ones from contrib modules) are available for indexing. On one hand, this is a big improvement over previous search solutions, that allowed only nodes (and node-related data) to be searched, but on the other hand, if you want to index whole pages or even external data, you're out of luck. You would have to define this other data as an entity, with all its information — which works, but is still a work-around and doesn't work in cases where you, e.g., only want to search external data, not index it.
The solution would probably be to add another abstraction layer between the searches and the entity layer, that would allow things other than normal entities to be included. Basically, indexes would then have a datasource-specific implementation class (like the backend-specific service classes for servers) which would take care of item retrieval, property extraction, reacting to creations, updates and deletions of items, etc.
This is far from trivial, as the "searched items are entities" assumption is baked into the Search API in numerous places on a rather basic level. Coming up with a way to solve all these references won't be easy. Also, an upgrade path has to be provided for current users.

However, Robin Barre recently created an issue where he tries to solve this problem. Therefore, maybe this will already be solved once GSoC starts, or my main work might merely be to review Robin's solution and help him complete it.

Add search notifications

Add a feature that allows users to save a search and be automatically notified of any new results showing up. This is a frequent feature especially for eCommerce sites like eBay. It will have to be determined how to best implement this without harming performance too much, but probably it would just work by executing the saved searches on (some) cron runs. Things like update frequency, maximum result count (so users don't save searches like "the"), saved searches per user, etc., could be set by the admin, the first one also by the user.
The main problem here is how to best save the results already seen by a user, so she doesn't see them again (at least not in the notification mail). One idea here is to create a Views filter, which would both allow the user to filter to new terms in the base search and also solve the problem of how to display the results in the mail) — but, of course, this still wouldn't solve the problem of saving this data in the first place.
This will very probably be an additional contrib module, not contained in the Search API base project.

Add autocompletion feature

While this is rarely a cirtical requirement, autocompletion for search keywords is definitely nice to have. For an example, see Google.
Since there currently is no way to search for partial hits with the Search API, this feature will very probably be a Solr-specific extension. Or at least a backend-dependent feature, with only a Solr implementation provided by me.
This will probably be a contrib module of its own, and based on the current Apache Solr Autocomplete module.

Add additional little multi-language features

While some basic requirements for i18n were included in the Search API right from the start, only very little is provided yet on the frontend in that respect. Therefore, a few helpful features should be included (all of them in the main project, in the fitting module):

  • Add an option list to the "Item language field, so e.g. facets display the language name, instead of its ID.
  • Add a data alteration for indexing only items in a certain language.
  • (Maybe) Add a setting to indexes, which language they should use for retrieving the data from translatable fields when indexing.
  • Improve the "Item language" Views filter to add a "Current language" option.

While still leaving open a number of problems for multi-language sites, these would at least solve several frequent use cases, and should all be pretty simple to implement.

Extend test coverage

There are already several tests, for both the UI and the database backend. (Testing the Solr backend is almost impossible to do cleanly, as this would require setting up a test Solr server.) Still, most additional modules (Views, Facets, …) are untested, and there is also always room for improving the existing tests.
Also, all other tasks worked on during this project should have some (or, better, extensive) test coverage.

Extend documentation

Basically, exactly the same as for test coverage holds. I already consider the documentation pretty good (compared to many other, popular modules), but there is always room for improvement. E.g., handbook pages for both users and developers could be added, as well as advanced_help integration and even (additional) tutorial videos.

Additional notes

As you can see, those are several tasks, the first two of which I'd also consider rather hard to do. Still, I'm confident that I'd be able to complete all of those (except testing and documentation, which are inherently almost incompletable) during the three months of GSoC, and would love to have that sort of incentive to work on these problems and features.

About me

I'm a 24 year old CS Master student living in Vienna, Austria and already a bit of a GSoC veteran as this would be my fourth Summer of Code project for Drupal. In 2008, I provided Views with pluggable data backends and implemented one for the apachesolr module.
In 2009, I created the apachesolr_rdf module, which was a bit like a much weaker version of the Search API, centered on RDF instead of entities and using only Solr.
And, as mentioned, last year I created the Search API module.


Viewing all articles
Browse latest Browse all 49209

Trending Articles