Open Source Endeca in 250 Lines or Less
Casey Durfee
The Seattle Public Library
casey.durfee@spl.org
Use the source, Luke
code:
http://extranet.spl.org/code/code4lib2007.zip
demo:
http://catalog.spl.org/catalog/
A more accurate description
I will detail how you can create an
OPAC with features comparable to
Endeca N-Dekka or
AquaBrowser's
Ockwa Bowser
search products (faceted browsing, relevancy ranking, fuzzy searching) using the open-source Apache
Solr search engine and
your favorite web programming language my favorite programming language.
I will present a catalog with most of
Endeca's N-Dekka's features in
250 lines of code or less not very many
lines of code and discuss performance/scalability concerns and common pitfalls when using Solr.
[put talk spiel here ]
For Sticklers
There is a "for sticklers"
version you can download that has everything in 250 lines or less but DO NOT STARE DIRECTLY AT IT. It is so obfuscated
and awful it may blind you.
Why count lines?
Number of Bugs ~ Lines of Code
1.5 (according to the
Mythical Man Month)
A 2500 line program has on average 42x as many bugs as a 250 line program.
Search results
This is what our search results screen will basically look like.
Giant Legos
by Sean Kenney
This here is a picture of a giant lego block made up of 5,000 smaller blocks. I think about
Solr and other tools I use like these giant lego blocks that somebody else has already put together for you.
They're extremely modular -- easy to stick on to anything else you might be building, and they allow you
to make really big, really Enterprise stuff really fast. And four of these gigantic bricks take as long to put together
as four of the tiny bricks that make them up. That's why you can build a webapp -- with features that would cost you $50 or 100,000 to
buy from some vendor who put together 5,000 smaller blocks themselves -- with not a whole lot of work.
Solr shortcuts
- Results in Python Format (no XML)
- No Database
- Search Syntax = Lucene Syntax
[spiel here]
Django features
- "MTV" Architecture forces good design, developer and designer friendly
- ORM
- Authentication
- Automatic admin site
- i18n
- Syndication framework
- Caching
- Templating
- Clean URLs
[spiel here]