Archive for March, 2007

fish4bikes database has grown significantly through deal with NXGN

March 23, 2007

fish4bikes now has over 6,000 used motor bikes through a deal with NXGN.

Advertisements

fish4jobs now allows users to upload CVs

March 23, 2007

fish4jobs users are now able to upload CVs in .doc, .docx, .rtf, .pdf and .txt format. This is fantastic news and already the signs are users are responding very well to this new capability!

fish4jobs already has one of the freshest CV databases and this will only help grow it significantly!

fish4jobs now supports OpenSearch initiative

March 18, 2007

fish4jobs now supports OpenSearch. If you are using IE7 or Firefox 2 checkout the search section in the toolbar. In IE7 it will glow orange when you are on the fish4jobs home page. This allows users to perform fish4jobs searches even when they are not on the fish4jobs website. None of fish4jobs competitors offer this functionality yet.

AMD product planning blunder allows Intel to take technical lead

March 18, 2007

It is amazing to think that AMD which had been regularly beating Intel to technical milestones would make such a product planning blunder as to let Intel be first to market with quad core processors. Such was AMD’s perceived technology lead that even Dell was forced to introduce AMD based products. Intel needs a worthy competitor to ensure the continued x86/x64 innovation. AMD have been extremely good at this for the last 4 years. Let’s hope they have learnt from this mistake.

Chennai visit

March 16, 2007

Today is my last day in Chennai, India. I have been in Chennai since arriving in the very early hours of Sunday morning. This is my second visit to the Infosys team based off-shore in Chennai. I will hopefully be back later in the year with a couple of my direct reports to see the team again. Thanks for your hospitality guys! Special thanks for Niaz for meeting me at the airport with my driver on Sunday.

Automatically document your SQL Server, Oracle, MySQL, Postgres and Access databases

March 15, 2007

We came across SqlSpec from Elsasoft a couple of months ago but never had time to look at it. Well this week whilst in India I found the time to check it out.

I have used it to document our SQL Server 2005 databases and it seems pretty comprehensive. The current release version does not document Partition Schemas, Partition Functions, Full-Text Catalogs and Full-Text Indexes so I emailed the developer on Sunday and within 12 hours he had responded with a new test version which contained support for those items. I have since emailed he with approximately 15 other suggestions which he plans to add to the product.

We are very happy with it so far.

Microsoft SQL Server 2005 Full-Text Search (FTS) ramblings…

March 13, 2007

The following are notes we have put together on FTS for a feedback session with the Microsoft SQL Server product managers…

Z_Search_Freetext table contains the following 

Id         AdId     Title     Freetext1          11         kljlkj      khk2          12         test       test 

Once indexed run the following 

drop table #jobtmp9create table #jobtmp9(AdId numeric(10,0) not null, score int not null)insert into #jobtmp9 select F.AdId, i.Rank as score fromZ_Search_Freetext F inner join containstable(Z_Search_Freetext, (Title, [Freetext]), ‘(kljlkj) or (khk)’) as i on F.AdId = i.[Key]select * from #jobtmp9 

This returns 

AdId     Score11         32 

Now if I change OR to AND…. 

drop table #jobtmp9create table #jobtmp9(AdId numeric(10,0) not null, score int not null)insert into #jobtmp9 select F.AdId, i.Rank as score fromZ_Search_Freetext F inner join containstable(Z_Search_Freetext, (Title, [Freetext]), ‘(kljlkj) and (khk)’) as i on F.AdId = i.[Key]select * from #jobtmp9 

nothing is returned 

Surely kljlkj and khk don’t need to exist together in either Title or Freetext fields? 

This issue / feature looks even worse when using AND NOT

 

select * from containstable(Z_Search_Freetext, *, ‘kljlkj AND NOT khk’) 

The following is returned 

AdId     Score11         32 

It clearly shows that the evaluation is done on a field by field basis and if one of the field comparisons is true the record is returned. 

Imagine a scenario where you have a CV which is split into Title, Summary, Experience, Education and Other. You want to be able to allow the user to be able to search any one of these fields or combined by the selecting All, Title, Summary, Experience, Education or Other from a drop down list. Given the way FTS appears to work you can’t use the field wildcard (*) and would have to create another field which is an amalgamation, i.e. All = Title + ‘ ‘ + Summary + ‘ ‘ + Experience + ‘ ‘ + Education + ‘ ‘ + Other. This is hugely wasteful both from storage and processing perspective. 

Whether you consider these as bugs or features they need to be clearly documented with examples.

 

In reality we actually need (kljlkj) to be searched against Title only and (khk) to be searched against Freetext only and have a blended score. 

Is this the only way it can be achieved? 

drop table #jobtmp9create table #jobtmp9(AdId numeric(10,0) not null, Score int not null)insert into #jobtmp9 select (F.AdId) as AdId, (i1.Rank + i2.Rank) as Score fromZ_Search_Freetext F inner join containstable(Z_Search_Freetext, Title, ‘kljlkj’) as i1 on F.AdId = i1.[Key]inner join containstable(Z_Search_Freetext, [Freetext], ‘khk’) as i2 on F.AdId = i2.[Key]select * from #jobtmp9 

This isn’t necessarily ideal and FTS should really support blended scores. This could have a massive impact on the return order of search results. One advantage of the current implementation is that it allows the developer to apply a different weighting to each although the need for multiple joins is surely inefficient. 

 

Thesaurus 

Per Instance Only 

It appears that it is only possible to have one Thesaurus per instance of FTS. This is not very flexible and it should be possible to have one Thesaurus per Full-Text Index. If a Thesaurus is not present for a Full-Text Index, FTS should fallback to default for the instance of FTS. 

Weighting 

Using the Thesaurus it is not possible to weight expansions / replacements differently. For example

 

Windows

 

Expanded to

 

Windows 95

Windows NT4

Windows 98

Windows Millenium

Windows 2000

Windows XP

Windows
Vista

 

Windows XP arguably to date has been the most successful of these operating systems although using the FTS Thesaurus capability it is not possible to weight Windows XP higher than the other expansions.

 

 

Noise / Stop Words 

Per Instance Only 

It appears that it is only possible to have one Noise Word File per instance of FTS. This is not very flexible and it should be possible to have one Noise Word File per Full-Text Index. If a Noise Word File is not present for a Full-Text Index, FTS should fallback to the default for the instance of FTS. 

Search v Index 

FTS filters noise words at time of indexing, however, it does not for searching. For instance if a document contained the word ‘jobs’ and ‘jobs’ was present in the noise word file ‘jobs’ would not be included in the index. However if a search contains ‘jobs’, e.g. ‘Java AND jobs’, FTS does not ignore ‘AND jobs’ rather an empty result set is returned and an information message (not error) is displayed

 

Informational: The full-text search condition contained noise word(s). 

It would be handy if FTS had an option to ignore noise words present in searches, e.g. treat ‘Java AND jobs’ as ‘Java’

 

SP2 BOL states the following…

 

If a noise word is used in a single word search, SQL Server returns an error message indicating that the query contains only noise words. 

However it doesn’t state what will happen if it isn’t a single word search.

 

 

Words Containing Punctuation / Symbols 

FTS does not index the following properly 

Source

Indexed As
C++ C
.NET NET
ASP.NET ASP NET
C# C
COM+ COM
B&Q B Q

 

It seems insane that FTS can’t even index the names of some of Microsoft’s own technologies properly! FAST ESP has an override list. FTS should be able to index such words and it should be configurable per Full-Text Index. If not present for a Full-Text Index, FTS should fallback to the default for the instance of FTS 

 

Matching Against Sentence / Paragraphs 

With FTS there is no way to specify that words or phrases must appear in the same sentence or paragraph. For instance if you are searching for documents mentioning Microsoft’s CEO you cannot improve the relevance of results returned by saying the terms Microsoft and CEO must appear in the same sentence or paragraph. FAST ESP supports sentence and paragraph level searching.

 

 

NEAR 

Although FTS supports the NEAR operator the documentation does not state how near is determined and it is not possible to override the default by specify a number of words. FAST ESP supports NEAR and allows the developer to specify the number of words.

 

 

NOT 

Although FTS supports AND NOT it does not support NOT, i.e. search for any document that does not contain a term or phrase specified. The following produce errors

 

select * from containstable(Z_Search_Freetext, *, ‘AND NOT khk’)

select * from containstable(Z_Search_Freetext, *, ‘NOT khk’) 

FAST ESP supports both ANDNOT and NOT.

 

 

Inclusion / Exclusion Operators 

The Inclusion operators (+) and Exclusion operators (-) are not supported. In the UK Google is 78% of the market so users are reasonably used to these.

 

 

Wildcards 

FTS only supports * at the end of words or phrases, e.g.

 

select * from containstable(Z_Search_Freetext, *, ‘”admin*”‘) 

However FTS does not support 

select * from containstable(Z_Search_Freetext, *, ‘”*text”‘)select * from containstable(Z_Search_Freetext, *, ‘”admin*or”‘) 

FTS also does not support a single wildcard character (?), e.g.

 

select * from containstable(Z_Search_Freetext, *, ‘”te?t”‘) 

FAST ESP supports all of these.

 

 

Copy Database Functionality 

The copy database functionality available in Microsoft SQL Server 2005 up to SP1 does not copy Full-Text indexes. Therefore if you copy a database containing Free-Text Indexes to the same machine with a different name they share the same Free-Text Indexes. The result is gradual corruption and hanging of the Full-Text Search service. The issue is not obvious and can take some time to show itself. We have not tested this with SP2 but am sure this related to SP1. 

 

Books OnLine (BOL) 

BOL prior to SP1 (not sure about SP1 / 2) 

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/fulltxt9/html/3ef96a63-8a52-45be-9a1f-265bff400e54.htm#Feedback 

stated that the Thesaurus is located in 

SQL_Server_install_path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\ 

This is true if it is not an upgrade. If it is an upgrade it is located in 

SQL_Server_install_path\Microsoft SQL Server\MSSQL\FTDATA 

A lot of instances will be upgrades so should state

 

 

Architecture 

Distributed Architecture 

It does not appear to be possible to run FTS on another server or even distribute across a number of servers. This means have to scale the database server and storage which isn’t the most cost effective way.

 

CPU Utilisation 

Unlike the SQL Server service itself it does not appear to be possible to set CPU affinity for the FTS service. It also does not appear to be possible throttle CPU usage by Index

 

Index / Query Pipelines 

FTS does not have Index or Query Pipelines. Therefore it is not possible to pre-process a document at Index time or pre-process a query entered by a user before it is executed unless it is done outside of FTS. Just as with Noise Words you might want to strip out of expand words of terms before the document is actually indexed, e.g. you might only be interested in the first 1,000 characters. Ideally it should be possible to write pipeline stages using the .net framework. 

FAST ESP supports both types of pipelines mentioned albeit the pipeline stages are typically written in Python. 

 

Product Positioning 

Ultimately FTS should be its own standalone product. A product that SQL Server just happens to integrate with. It should be possible to install, configure and query FTS without the need for SQL Server. It should be possible to execute FTS queries using Attributes rather than just free-text, e.g.

 

salary > 20000 and salary < 30000 and type = ‘perm’ and workinghours = ‘fulltime’ and category in (1,3,5) and title = ‘”SQL Server” and not Oracle’

 

 

Keyword / Expansion / Replacement Result Highlighting 

FTS does not return the snippet of text or texts that matched the search criteria. Increasingly users are demanding that the results returned contain the snippet(s) of text that matched and that the keywords / expansions / replacements are highlighted. FAST ESP has this capability. 

 

Parser 

If you wish to present your users with a text box in which they can enter their search criteria you are more than likely going to have to write a search parser. If your application is internally facing you might, although unlikely, be able to get away with teaching your users the basic syntax but if your application is public facing you will have no choice really but to develop a parser which interprets the input and converts it to the format FTS understands. 

For example if a user were to enter 

SQL Server 

FTS would return an error. FTS is expecting SQL Server to be enclosed in double quotes or a Boolean operator such as OR, AND, AND NOT or NEAR to be present between the two words, e.g. 

SQL AND Server 

The following is a more advanced example. If the user were to enter 

ASP.NET (C# or VB.NET) “SQL Server”
London
 

It would need to be converted to 

ASP.NET and (C# or VB.NET) and “SQL Server” and
London
 

This is surely such a common requirement that FTS should support this out of the box without developers having to write any code. The above examples would work if there was an implied
AND.
 

FAST ESP has a basic parser known as “Simple Query Language Support”. It supports Inclusion (+), Exclusion (-) and Phrases. This really isn’t that much use and appears to be a subset of the functionality available in previous versions. 

 

More like this… 

FTS does not have a “give me more documents like this one” capability. FAST ESP does.

 

 

Force Indexing of a Record or Document 

It would be nice to be able to force FTS to index a single record or document on demand. For example if you had a file system which contained millions of CV documents and you knew exactly when a document was uploaded why wait for a crawler to seek out new or updated documents before they are added to the index.

 

FAST ESP supports this.

 

 

Gartner’s Information Retrieval Magic Quadrant 

FTS has not appeared in Gartner’s Information Retrieval Magic Quadrant for the last two years or more.

Summary

FTS is a good product but these changes will make it a great product.

For a short time only…

March 11, 2007

For a short time only you can post a job advert on fish4jobs for £149 (+VAT) compared to the usual £199 (+VAT)

Yet more Dell supplied Maxtor disk failures

March 11, 2007

Last weekend we had two more Dell supplied Maxtor disk failures. Dell are blaming the firmware on the Maxtor 300GB SAS drives. Dell support told us we must upgrade the firmware on the disk drives ASAP otherwise they might not send out replacements in the future. My operations team subscribe to Dell product updates but tell me that this does not include firmware or BIOS upgrades. Come on Dell!!! It turns out there were various other firmware, BIOS and driver updates necessary and the new data centre has only been live a few months!

Whilst upgrading the firmware on the disk drives we noticed that the drives were up to 8 months old when we bought them new from Dell!

Given the recent failure rate we should find out pretty soon whether the firmware upgrade has worked.

To be a Generalist or Specialist

March 1, 2007

A quote from my boss Joe here…

http://www.onrec.com/content2/news.asp?ID=15623