This is a read-only archive of the old Scratch 1.x Forums.
Try searching the current Scratch discussion forums.

#1 2012-02-20 10:55:46

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

chocolate? Did you mean Bristol?

Hi Guys, for my University coursework I am doing the coding for a group assignment to build a website for a fictional airline. The airline only flies around the UK to five large cities, Manchester, Glasgow, Dublin, Bristol and Newcastle but the website must still have a flight-booker and searcher as well as a full customer support system and back-end!

At the moment I am working on a search suggestions program which will correctly suggest the destination you had in mind if you accidentally misspell something. I thought I would write up about the problems and solutions I have been experiencing so far with this system in case anyone is interested or has suggestions to improve on the errors I'm currently getting!

Search correction method 1

This program is written in PHP but it should be fairly easy to read.

Code:

function didYouMean($string){
    $manchester = array("manchestur", "menchester", "maanchestr", "manchestr", "man", "chester", "mancseter", "mancherst");
    $bristol = array("brizzle", "bristul", "bristil", "bistol", "bristal", "brstil", "ristl", "ristul", "bristul");
    $newcastle = array("castle", "newcasul", "newcasl", "newcasle", "nyewcastle", "noocastle", "newcaste");
    $dublin = array("dublon", "dubin", "dubblin", "dublinn", "dubblinn");
    $glasgow = array("gasgow", "glassgo", "glassgow", "glasgo", "glarsgo", "glahsgow", "glasgou", "glassgou");
    $suggestion = '';
    if(in_array($string, $manchester)){
        $suggestion = $uggestion . "MANCHESTER";
    }
    if(in_array($string, $bristol)){
        $suggestion = $uggestion . "BRISTOL";
    }
    if(in_array($string, $newcastle)){
        $suggestion = $uggestion . "NEWCASTLE";
    }
    if(in_array($string, $dublin)){
        $suggestion = $uggestion . "DUBLIN";
    }
    if(in_array($string, $glasgow)){
        $suggestion = $uggestion . "GLASGOW";
    }
    
    if($suggestion != ''){
        $suggestion = "<span class = 'keyword'>Did you mean " . $suggestion . "?</span>";
    }
    return $suggestion;
}

It takes an unknown destination input and checks it against a list of common misspellings I've thought up. If it matches, the correct city will be displayed to the user and they can continue. This works perfectly fine... so long as your misspelling is listed. Unfortunately something I've not thought of can easily slip by and no search will be returned. It also has to be updated by hand, meaning it will make adding other flight destinations harder in the future. So I tried idea 2:

Search correction method 2

This method is currently written in Scratch, which is great for quick mockups of program performance and is fast to debug. It takes all the letters from your entered search term and then works out what percentage of the letters of each city match the ones in your search. Manchestur, for example, returns as a 90% match to Manchester. The highest percentage match is selected and displayed. This method always returns a search suggestion and will easily update to allow other destinations. It also caters for all misspellings rather than just the ones I can think of.

Here's an image of the script at work:
http://dl.dropbox.com/u/22935223/screenshotNoCap.gif
(link to image) (link to image of script)As you can see, there is a MAJOR flaw. Typing "The worst place on earth" returns a 60% match to Manchester! This is something that a search suggestion shouldn't do. It looks very unprofessional if you get a message saying "We have no flights to the worst place on earth. Did you mean Manchester?". So despite always returning a result, it will easily play up. Chocolate returns as Bristol.

Search correction method 3

This method is the same as the one above, but it only returns a result if the percentage similarity is higher than a set point. If I set it at 65%, I no longer get a result for "gaspode the wonder hound"

http://dl.dropbox.com/u/22935223/screenshotWithCap.gif(link to image) (link to script with search cap at the end)

This will now cut off unruly searches but set the cap too high and sensible searches will still not get through! For example, "Brizzle", a slang word for Bristol would still return Bristol without the cap, but at only 57% match it can easily be filtered out.
___________________________________________________________________________________

I hope people have found this interesting to read, and that some may be able to suggest better search systems for use with the site  smile

Last edited by sparks (2012-02-20 11:02:11)


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#2 2012-02-20 11:09:46

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

I would do a combination.

1.  Run search method 1 (including Brizzle, etc. as a misspelling).  If one matches, return it.
2.  Run search method 3.  Put a high cap on it, because other ones with low % ratings, like Brizzle, should be filtered out by step 1.

At some point, you have to assume that the user isn't incredibly stupid, and that they aren't typing in total gibberish.  I think this solution would work well; "worst place on earth" would return no city, but "Manchisterr" would.

Offline

 

#3 2012-02-20 11:12:53

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

Greenatic wrote:

I would do a combination.

1.  Run search method 1 (including Brizzle, etc. as a misspelling).  If one matches, return it.
2.  Run search method 3.  Put a high cap on it, because other ones with low % ratings, like Brizzle, should be filtered out by step 1.

At some point, you have to assume that the user isn't incredibly stupid, and that they aren't typing in total gibberish.  I think this solution would work well; "worst place on earth" would return no city, but "Manchisterr" would.

I like the idea, Greenatic! A balance is probably a good plan  smile  It will lose the self-updating thing though, since new common misspellings won't be added automatically.

Interestingly with methods 2 and 3, anagrams still return as 100%.

Last edited by sparks (2012-02-20 11:14:35)


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#4 2012-02-20 11:15:13

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

sparks wrote:

Greenatic wrote:

I would do a combination.

1.  Run search method 1 (including Brizzle, etc. as a misspelling).  If one matches, return it.
2.  Run search method 3.  Put a high cap on it, because other ones with low % ratings, like Brizzle, should be filtered out by step 1.

At some point, you have to assume that the user isn't incredibly stupid, and that they aren't typing in total gibberish.  I think this solution would work well; "worst place on earth" would return no city, but "Manchisterr" would.

I like the idea, Greenatic! A balance is probably a good plan  smile  It will lose the self-updating thing though, since new common misspellings won't be added automatically.

Not necessarily!  If you make the screen for "no city found" ask "What city did you mean?" and record the result, then you can add in that functionality.  You'll have to watch for trolls, though.  "Efovrgbuoernionipnionogniorgnioniorg! I meant Manchester!"

Offline

 

#5 2012-02-20 11:22:06

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

That's true... I suppose it could be included as an input for bored staff to fill out in the back-end too  tongue


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#6 2012-02-20 11:25:33

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

sparks wrote:

That's true... I suppose it could be included as an input for bored staff to fill out in the back-end too  tongue

I wouldn't charge anyone with having to do that  tongue   I would probably add in the functionality, but put another Method 3 scan on the way out, but with a lower cap.  That should prevent total insanity on the part of any malicious users...   smile

Offline

 

#7 2012-02-20 11:28:58

Magnie
Scratcher
Registered: 2007-12-12
Posts: 1000+

Re: chocolate? Did you mean Bristol?

Maybe check each letter to see if it's in one of the names of the cities? "Manchesterr" would return "Manchester" since they both contain the same letters. Does that make sense?

Offline

 

#8 2012-02-20 11:33:11

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

Magnie wrote:

Maybe check each letter to see if it's in one of the names of the cities? "Manchesterr" would return "Manchester" since they both contain the same letters. Does that make sense?

That's what methods 2 and 3 are doing. They check to see what percentage of the letters in your search match the letters in each city. The one with the highest percentage of matching letters is chosen.

For some reason abcdefghijklmnopqrstuvwxyz doesn't return 100% which is good. Not sure why it does that but it's good!


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#9 2012-02-20 11:37:23

Magnie
Scratcher
Registered: 2007-12-12
Posts: 1000+

Re: chocolate? Did you mean Bristol?

sparks wrote:

Magnie wrote:

Maybe check each letter to see if it's in one of the names of the cities? "Manchesterr" would return "Manchester" since they both contain the same letters. Does that make sense?

That's what methods 2 and 3 are doing. They check to see what percentage of the letters in your search match the letters in each city. The one with the highest percentage of matching letters is chosen.

For some reason abcdefghijklmnopqrstuvwxyz doesn't return 100% which is good. Not sure why it does that but it's good!

Ah, sorry, didn't read all the way through.  tongue

Offline

 

#10 2012-02-20 11:41:10

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

Magnie wrote:

Ah, sorry, didn't read all the way through.  tongue

It is a pretty long post. It deleted itself the first time too about two thirds of the way through. I had to start again.

The website can been seen here, by the way: WebAir Home Page

Currently it is employing method 1 for its search system.


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#11 2012-02-20 11:42:47

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

So, in summary, I think this solution is best:

1.  Method 1.
2.  Method 3, high cap.
If no city found:
3.  Ask what city was intended, from a list.
4.  Compare the original search to the intended search, like Method 3 does.  Use a medium cap.
5.  If it passes, add it to the list of misspellings.  If it doesn't, ignore it.

Offline

 

#12 2012-02-20 11:44:13

roijac
Scratcher
Registered: 2010-01-19
Posts: 1000+

Re: chocolate? Did you mean Bristol?

what about common misspelling? like double letters, s/c, s/z, k/q etc?
it probably boost it up, assuming you use something faster than scratch  roll

Offline

 

#13 2012-02-20 11:45:34

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

roijac wrote:

what about common misspelling? like double letters, s/c, s/z, k/q etc?
it probably boost it up, assuming you use something faster than scratch  roll

Methods 2 and 3 check for the % of correct letters, so they should take care of that.   smile

Last edited by Greenatic (2012-02-20 11:45:52)

Offline

 

#14 2012-02-20 11:46:45

roijac
Scratcher
Registered: 2010-01-19
Posts: 1000+

Re: chocolate? Did you mean Bristol?

they don't give bonus points if it's just s instead of c etc. ^^

Offline

 

#15 2012-02-20 11:51:34

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

roijac wrote:

they don't give bonus points if it's just s instead of c etc. ^^

Yeah, but the other letters would be correct, so it shouldn't really be necessary.   smile

EDIT:  My post count is now 1111!   lol

Last edited by Greenatic (2012-02-20 11:52:21)

Offline

 

#16 2012-02-20 12:17:42

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

How do you know it's 1111? It doesn't display anymore... it just says >1000. I was on 1850 last time I looked and was really looking forward to hitting 2000 and now I don't get to see it!

mmaanncchheesstteerr will still return 100% for manchester


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#17 2012-02-20 12:18:50

roijac
Scratcher
Registered: 2010-01-19
Posts: 1000+

Re: chocolate? Did you mean Bristol?

it's good, isn't it? ^^
and you can click on posts and count, good luck  tongue

Offline

 

#18 2012-02-20 12:23:20

Greenatic
Scratcher
Registered: 2009-05-03
Posts: 1000+

Re: chocolate? Did you mean Bristol?

sparks wrote:

How do you know it's 1111? It doesn't display anymore... it just says >1000. I was on 1850 last time I looked and was really looking forward to hitting 2000 and now I don't get to see it!

mmaanncchheesstteerr will still return 100% for manchester

Click on "Profile" and look.   tongue

Offline

 

#19 2012-02-20 12:25:47

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

WOAH! 3955!! I forgot it was nearly 4000!

[/offtopic]


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#20 2012-02-20 12:27:49

rookwood101
Scratcher
Registered: 2011-07-29
Posts: 500+

Re: chocolate? Did you mean Bristol?

You should notice that people that are typing in Brizzle are either incredibly misinformed or just on the website for a joke, the likelihood of anyone that wants to fly there and spelling it like that is so low that it isn't really worth bothering to consider it. Anyone in the category will also realise that your search engine isn't that clever (it doesn't need to be) and type bristol in correctly.

Edit: also remember that &copy; will give you a copyright symbol. Also, nice site  smile

Last edited by rookwood101 (2012-02-20 12:30:29)


http://i.imgur.com/zeIZW.png

Offline

 

#21 2012-02-20 12:30:28

scimonster
Community Moderator
Registered: 2010-06-13
Posts: 1000+

Re: chocolate? Did you mean Bristol?

In method 1 you have $uggestion. XD
Check if the length of the input is closer than 6 away from any of them, then run method 1, then 3.

Offline

 

#22 2012-02-20 12:40:42

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

When designing for coursework, it's sometimes more about showing off your skills than catering for idiots  tongue  Also, Brizzle is just an example, I'm sure there are other misspellings that would fall below the 65% mark.
I didn't notice the copyright symbol was missing on the flight booker page, good spot!

Scimonster, yeah, it says $uggestion which I have now removed entirely since it was left over from an earlier code where the "did you mean" was joined individually within each if statement  tongue

I also like your idea of not even bothering to search if the length is drastically different to any of the results! As long as we don't ever fly to Milton-Keynes or Box  wink


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#23 2012-02-20 12:47:30

scimonster
Community Moderator
Registered: 2010-06-13
Posts: 1000+

Re: chocolate? Did you mean Bristol?

sparks wrote:

When designing for coursework, it's sometimes more about showing off your skills than catering for idiots  tongue  Also, Brizzle is just an example, I'm sure there are other misspellings that would fall below the 65% mark.
I didn't notice the copyright symbol was missing on the flight booker page, good spot!

Scimonster, yeah, it says $uggestion which I have now removed entirely since it was left over from an earlier code where the "did you mean" was joined individually within each if statement  tongue

I also like your idea of not even bothering to search if the length is drastically different to any of the results! As long as we don't ever fly to Milton-Keynes or Box  wink

Is there really a city called Box?

Offline

 

#24 2012-02-20 13:15:26

sparks
Community Moderator
Registered: 2008-11-05
Posts: 1000+

Re: chocolate? Did you mean Bristol?

Yes, Box is a town in Wiltshire, England


http://img541.imageshack.us/img541/7563/scratchbetabanner.png

Offline

 

#25 2012-02-20 13:46:51

rookwood101
Scratcher
Registered: 2011-07-29
Posts: 500+

Re: chocolate? Did you mean Bristol?

sparks wrote:

Yes, Box is a town in Wiltshire, England

lol I live in wiltshire and even I didn't know that  smile


http://i.imgur.com/zeIZW.png

Offline

 

Board footer