Wikipedia:Wikipedia Signpost/2005-04-18/Lucene search

From Wikipedia, the free encyclopedia
Lucene search

Internal search function returns to service

For the first time in several months, it became possible again last week to search Wikipedia without using an external search engine.

Developer Brion Vibber reported on April 10 that a new search server had been set up[1]. The server uses a program originally written by Kate last December based on the Lucene text search engine.

Kate indicated that in covering the search load from the English Wikipedia, Lucene was doing "a bit more than MySQL can manage on even the fastest database servers." The new search function is capable of suggesting spelling corrections and close title matches, and also provides a score evaluating the relevance of individual results to the search.

On Tuesday, search was taken offline again after an apparent memory leak in the server. With some additional debugging, the developers were able to restore it, also adding a second server to divide the workload involved.

On Thursday, Vibber indicated that searching was now activated on several additional languages. He added that while the search index did not yet update automatically as pages changed, he hoped this feature would be available shortly.

The search function has been active at various times over the past few years, but mostly unavailable for many months. Even when operating, it was often unavailable during peak traffic hours. Alternative external search boxes, first one using Google search and later a similar one using Yahoo! search, were made available during periods when internal searching was down.

As indicated, Kate wrote the current program in December and it was briefly implemented, but was soon taken down due to concerns that Sun's Java virtual machine, in which it was implemented, was not free software. The program has since been rewritten to use GCJ.