Why using FTSearch in LotusScript is usually a bad idea

Well it’s New Year’s Day so the obvious subject that is on everyone’s lips is database full text searching. Isn’t it?

Yesterday I tweeted about having to remove a load (and I mean a load) of database.FTSearch calls from an application I’ve inherited recently. Several people asked why I would want to do that, so I thought I’d explain.

On the face of it using the Full Text Index to find a collection of documents is an ideal way of working, it saves you having to create a suite of views that will only be used by your LotusScript and we all know that less views is a good thing. But, in my view, there is almost no situation when you should use an FTSearch unless you are looking for a word or phrase across *all fields* in *all documents* in your database. If you find you are writing query syntax that uses the FIELD or CONTAINS keywords then you should probably be thinking again about your code design.

The main issue with relying on the Full Text Index is that it is normally out of date. As a rule of thumb you can’t rely on any changes that have happened in the last 30 minutes to show up in your search results. So in my case yesterday, the agents were being run almost immediately after a new document is added to the database and then expecting to find that document and ones related to it. When you’re relying on luck rather than judgement then you know you have a problem.

So, what to do? Generally the answer is to design your database views carefully. You can build a suite of views that can be used across the database that will allow you to build collections of documents that match your requirements. Remember, you can have multiple sorted columns if needed and pass an array as the key parameter in you getAllDocumentsByKey call, or build composite keys in the first column of your views.

In fact I’d go as far as saying that if your document collection isn’t too big that it may be better to build a larger collection than you need and loop through it to find the exact documents you need, rather than performing an FTSearch. The other benefit of using views, of course, is that you can build a NotesViewEntryCollection which will perform a lot better than a document collection and you get the added benefit of being able to use the sorting from the view design rather than having to apply your own quick sort after you’ve built whatever output array you’re trying to get to.

As with all of these types of general design principals, your mileage may vary, if you have a good reason to use the Full Text Index then go ahead and use it. But please make sure you understand what limitations you face when using the index rather than views.

Join the Conversation

3 Comments

  1. I’ve used FTSearch before and still do when searching across multiple dbs that may or may not be FT indexed. As I’m sure you know, FTSearch will build a temporary FT index if one does not already exist. My biggest problem with FTSearch is that if you exceed the maximum number of results (5000 by default), no documents are returned and you may never know.

    Like

  2. Hey Daddy Matt! Found this when I googled "NotesViewEntryCollection FTSearch" because I had just made a bit of a breakthrough using FTSearch to perform complex queries like this:

    [Market] = ("EAST") AND [SalesArea] = ("SOUTHEAST", "NORTHEAST") AND [SalesRep] CONTAINS ("Mary Jones", "Bill Smith") AND [RenewType] = ("AutoRenew", "One Time")

    The problem I found is that the results were highly inaccurate for some reason, with many more documents being returned than expected. Still not 100% sure why but I suspect the CONTAINS calls (necessary because the SalesRep field is canonical) get treated as some kind of OR call, thus expanding the results in weird ways. Rather than address that though I hit on another solution.

    The trick is to use multiple, simpler FTSearch calls against a NotesViewEntryCollection, starting with full set of docs and reducing the collection with each iteration. By keeping each FTSearch call as simple as possible, the accuracy is more assured, and it’s still pretty fast.

    The use case here is a filtering tool much like you might see when searching on Amazon or Ebay. We happen to be running this on local replicas in the Notes client, so we have to factor in creation and updates to the FT Index. Right now, we have it set that upon DB Open we update the FTIndex (and auto create it if it doesn’t exist), or right after using a special flag action that exists in the results folder into which we throw our resultant view entry collections. Aside from that flag most if not all of the other fields available to filter on are not user-modifiable, so even if users do make modifications to documents it generally doesn’t invalidate the search results.

    Like

Leave a comment