Hi Guys, for my University coursework I am doing the coding for a group assignment to build a website for a fictional airline. The airline only flies around the UK to five large cities, Manchester, Glasgow, Dublin, Bristol and Newcastle but the website must still have a flight-booker and searcher as well as a full customer support system and back-end!
At the moment I am working on a search suggestions program which will correctly suggest the destination you had in mind if you accidentally misspell something. I thought I would write up about the problems and solutions I have been experiencing so far with this system in case anyone is interested or has suggestions to improve on the errors I'm currently getting!
Search correction method 1
This program is written in PHP but it should be fairly easy to read.
function didYouMean($string){
$manchester = array("manchestur", "menchester", "maanchestr", "manchestr", "man", "chester", "mancseter", "mancherst");
$bristol = array("brizzle", "bristul", "bristil", "bistol", "bristal", "brstil", "ristl", "ristul", "bristul");
$newcastle = array("castle", "newcasul", "newcasl", "newcasle", "nyewcastle", "noocastle", "newcaste");
$dublin = array("dublon", "dubin", "dubblin", "dublinn", "dubblinn");
$glasgow = array("gasgow", "glassgo", "glassgow", "glasgo", "glarsgo", "glahsgow", "glasgou", "glassgou");
$suggestion = '';
if(in_array($string, $manchester)){
$suggestion = $uggestion . "MANCHESTER";
}
if(in_array($string, $bristol)){
$suggestion = $uggestion . "BRISTOL";
}
if(in_array($string, $newcastle)){
$suggestion = $uggestion . "NEWCASTLE";
}
if(in_array($string, $dublin)){
$suggestion = $uggestion . "DUBLIN";
}
if(in_array($string, $glasgow)){
$suggestion = $uggestion . "GLASGOW";
}
if($suggestion != ''){
$suggestion = "<span class = 'keyword'>Did you mean " . $suggestion . "?</span>";
}
return $suggestion;
}It takes an unknown destination input and checks it against a list of common misspellings I've thought up. If it matches, the correct city will be displayed to the user and they can continue. This works perfectly fine... so long as your misspelling is listed. Unfortunately something I've not thought of can easily slip by and no search will be returned. It also has to be updated by hand, meaning it will make adding other flight destinations harder in the future. So I tried idea 2:
Search correction method 2
This method is currently written in Scratch, which is great for quick mockups of program performance and is fast to debug. It takes all the letters from your entered search term and then works out what percentage of the letters of each city match the ones in your search. Manchestur, for example, returns as a 90% match to Manchester. The highest percentage match is selected and displayed. This method always returns a search suggestion and will easily update to allow other destinations. It also caters for all misspellings rather than just the ones I can think of.
Here's an image of the script at work:
(link to image) (link to image of script)As you can see, there is a MAJOR flaw. Typing "The worst place on earth" returns a 60% match to Manchester! This is something that a search suggestion shouldn't do. It looks very unprofessional if you get a message saying "We have no flights to the worst place on earth. Did you mean Manchester?". So despite always returning a result, it will easily play up. Chocolate returns as Bristol.
Search correction method 3
This method is the same as the one above, but it only returns a result if the percentage similarity is higher than a set point. If I set it at 65%, I no longer get a result for "gaspode the wonder hound"
(link to image) (link to script with search cap at the end)
This will now cut off unruly searches but set the cap too high and sensible searches will still not get through! For example, "Brizzle", a slang word for Bristol would still return Bristol without the cap, but at only 57% match it can easily be filtered out.
___________________________________________________________________________________
I hope people have found this interesting to read, and that some may be able to suggest better search systems for use with the site
Last edited by sparks (2012-02-20 11:02:11)
Offline
I would do a combination.
1. Run search method 1 (including Brizzle, etc. as a misspelling). If one matches, return it.
2. Run search method 3. Put a high cap on it, because other ones with low % ratings, like Brizzle, should be filtered out by step 1.
At some point, you have to assume that the user isn't incredibly stupid, and that they aren't typing in total gibberish. I think this solution would work well; "worst place on earth" would return no city, but "Manchisterr" would.

Offline
Greenatic wrote:
I would do a combination.
1. Run search method 1 (including Brizzle, etc. as a misspelling). If one matches, return it.
2. Run search method 3. Put a high cap on it, because other ones with low % ratings, like Brizzle, should be filtered out by step 1.
At some point, you have to assume that the user isn't incredibly stupid, and that they aren't typing in total gibberish. I think this solution would work well; "worst place on earth" would return no city, but "Manchisterr" would.
I like the idea, Greenatic! A balance is probably a good plan
It will lose the self-updating thing though, since new common misspellings won't be added automatically.
Interestingly with methods 2 and 3, anagrams still return as 100%.
Last edited by sparks (2012-02-20 11:14:35)
Offline
sparks wrote:
Greenatic wrote:
I would do a combination.
1. Run search method 1 (including Brizzle, etc. as a misspelling). If one matches, return it.
2. Run search method 3. Put a high cap on it, because other ones with low % ratings, like Brizzle, should be filtered out by step 1.
At some point, you have to assume that the user isn't incredibly stupid, and that they aren't typing in total gibberish. I think this solution would work well; "worst place on earth" would return no city, but "Manchisterr" would.I like the idea, Greenatic! A balance is probably a good plan
It will lose the self-updating thing though, since new common misspellings won't be added automatically.
Not necessarily! If you make the screen for "no city found" ask "What city did you mean?" and record the result, then you can add in that functionality. You'll have to watch for trolls, though. "Efovrgbuoernionipnionogniorgnioniorg! I meant Manchester!"

Offline
sparks wrote:
That's true... I suppose it could be included as an input for bored staff to fill out in the back-end too
![]()
I wouldn't charge anyone with having to do that
I would probably add in the functionality, but put another Method 3 scan on the way out, but with a lower cap. That should prevent total insanity on the part of any malicious users...

Offline
Maybe check each letter to see if it's in one of the names of the cities? "Manchesterr" would return "Manchester" since they both contain the same letters. Does that make sense?
Offline
Magnie wrote:
Maybe check each letter to see if it's in one of the names of the cities? "Manchesterr" would return "Manchester" since they both contain the same letters. Does that make sense?
That's what methods 2 and 3 are doing. They check to see what percentage of the letters in your search match the letters in each city. The one with the highest percentage of matching letters is chosen.
For some reason abcdefghijklmnopqrstuvwxyz doesn't return 100% which is good. Not sure why it does that but it's good!
Offline
sparks wrote:
Magnie wrote:
Maybe check each letter to see if it's in one of the names of the cities? "Manchesterr" would return "Manchester" since they both contain the same letters. Does that make sense?
That's what methods 2 and 3 are doing. They check to see what percentage of the letters in your search match the letters in each city. The one with the highest percentage of matching letters is chosen.
For some reason abcdefghijklmnopqrstuvwxyz doesn't return 100% which is good. Not sure why it does that but it's good!
Ah, sorry, didn't read all the way through.
Offline
Magnie wrote:
Ah, sorry, didn't read all the way through.
![]()
It is a pretty long post. It deleted itself the first time too about two thirds of the way through. I had to start again.
The website can been seen here, by the way: WebAir Home Page
Currently it is employing method 1 for its search system.
Offline
So, in summary, I think this solution is best:
1. Method 1.
2. Method 3, high cap.
If no city found:
3. Ask what city was intended, from a list.
4. Compare the original search to the intended search, like Method 3 does. Use a medium cap.
5. If it passes, add it to the list of misspellings. If it doesn't, ignore it.

Offline
roijac wrote:
what about common misspelling? like double letters, s/c, s/z, k/q etc?
it probably boost it up, assuming you use something faster than scratch![]()
Methods 2 and 3 check for the % of correct letters, so they should take care of that.
Last edited by Greenatic (2012-02-20 11:45:52)

Offline
roijac wrote:
they don't give bonus points if it's just s instead of c etc. ^^
Yeah, but the other letters would be correct, so it shouldn't really be necessary.
EDIT: My post count is now 1111!
Last edited by Greenatic (2012-02-20 11:52:21)

Offline
How do you know it's 1111? It doesn't display anymore... it just says >1000. I was on 1850 last time I looked and was really looking forward to hitting 2000 and now I don't get to see it!
mmaanncchheesstteerr will still return 100% for manchester
Offline
sparks wrote:
How do you know it's 1111? It doesn't display anymore... it just says >1000. I was on 1850 last time I looked and was really looking forward to hitting 2000 and now I don't get to see it!
mmaanncchheesstteerr will still return 100% for manchester
Click on "Profile" and look.

Offline
You should notice that people that are typing in Brizzle are either incredibly misinformed or just on the website for a joke, the likelihood of anyone that wants to fly there and spelling it like that is so low that it isn't really worth bothering to consider it. Anyone in the category will also realise that your search engine isn't that clever (it doesn't need to be) and type bristol in correctly.
Edit: also remember that © will give you a copyright symbol. Also, nice site
Last edited by rookwood101 (2012-02-20 12:30:29)
Offline
In method 1 you have $uggestion. XD
Check if the length of the input is closer than 6 away from any of them, then run method 1, then 3.
Offline
When designing for coursework, it's sometimes more about showing off your skills than catering for idiots
Also, Brizzle is just an example, I'm sure there are other misspellings that would fall below the 65% mark.
I didn't notice the copyright symbol was missing on the flight booker page, good spot!
Scimonster, yeah, it says $uggestion which I have now removed entirely since it was left over from an earlier code where the "did you mean" was joined individually within each if statement
I also like your idea of not even bothering to search if the length is drastically different to any of the results! As long as we don't ever fly to Milton-Keynes or Box
Offline
sparks wrote:
When designing for coursework, it's sometimes more about showing off your skills than catering for idiots
Also, Brizzle is just an example, I'm sure there are other misspellings that would fall below the 65% mark.
I didn't notice the copyright symbol was missing on the flight booker page, good spot!
Scimonster, yeah, it says $uggestion which I have now removed entirely since it was left over from an earlier code where the "did you mean" was joined individually within each if statement![]()
I also like your idea of not even bothering to search if the length is drastically different to any of the results! As long as we don't ever fly to Milton-Keynes or Box![]()
Is there really a city called Box?
Offline
sparks wrote:
Yes, Box is a town in Wiltshire, England
lol I live in wiltshire and even I didn't know that
Offline