I'm writing an interactive fiction kind of game in Python, just because a) I can and b) I have nothing better to do. Anyway, I have to decode the user's input into a proper command. A command is given basically as a grammatical predicate, like go to Alcatraz, kick Darth Vader, or eat blue cheese in Switzerland. It is translated to a structure which basically corresponds to (I) eat (blue) cheese (in Switzerland), where "I" is implicit (you only control yourself), and adjectives should ideally be ignored.
English works very simply like this RegEx:
verb object? (preposition, noun)
e.g. eat blue cheese (in, Switzerland)
So I am basically asking for an elegant way to convert a list of words into a structure like this:
{ verb: __, object: __ or None/null for verbs like JUMP or GO which have no object, prepositions: [(preposition, noun), (preposition, noun) ...] }
Assume I have a list of prepositions, nouns, and verbs handy. Also, wrong syntax should throw an error, so you can't say "in cheese eat Switzerland".
Finally, each preposition-noun pair satisfies a function, which could be any of the following:
* Subject -- not needed in command
* Object -- no preposition needed, special case
* Instrument -- I hit the baseball with the bat
* Possession -- I hit my baseball -- not needed for commands
* Reason -- I hit it to prove a point -- not needed for commands
* Location -- I hit it into the next state
So actually I should have a dictionary of prepositions based on function.
Last edited by Hardmath123 (2012-11-09 10:47:19)
Offline
I ate Switzerland in Cheese once : /
Anyway, I don't know python but you should be able to break the words down into items in a list with [ ] as breaker (a space). Then, you would iterate through the list and find important words, like blue, cheese, switzerland etc.
Offline
Right, but how do I turn that into a piece of pure code? I'm planning to pass the rest of the arguments to a method of the instance of a class which is identified by the object, if that makes sense. So if I have "kick table", I would find the "table" in the current world/environment/room, and try to run a method called "kick", if you see what I mean.
"kick chair in Switzerland" <=> chair.kick(in="Switzerland")
Last edited by Hardmath123 (2012-11-09 10:50:55)
Offline
Hardmath123 wrote:
Right, but how do I turn that into a piece of pure code? I'm planning to pass the rest of the arguments to a method of the instance of a class which is identified by the object, if that makes sense. So if I have "kick table", I would find the "table" in the current world/environment/room, and try to run a method called "kick", if you see what I mean.
"kick chair in Switzerland" <=> chair.kick(in="Switzerland")
Hmmm. I suppose during the list iteration, after it's made a valid function name it stores it in a variable, and then the next one be the place? I don't know python, so I don't know if such a function exists.
Offline
I need to digest your post more fully; but here are some quick pointers:
First, parsing natural-language sentences is (as I'm sure you're aware) a [u]really hard problem[/b]. What you're trying to do sounds less like regex and more like defining a formal grammar for the subset of English commands that you support (which might be possible with a formal parser like lex/yacc, but they're designed for programming languages and such, not English).
DigiTechs has the right idea, I think: you need to tokenize the input into a sequence of words/punctuation characters — something like ['I', "'m", 'writing', 'a', 'game', 'in', 'Python', ',', 'just', 'because', '.'] (or maybe you should strip all punctuation completely?). Then you parse the tokens according to your grammar to get the structure. If you're not familiar with the idea — a shift/reduce parser such as PLY works by the successive application of rules like:
S -> NP VP PP -> P NP NP -> Det N | Det N PP | 'I' VP -> V NP | VP PP Det -> 'an' | 'my' N -> 'elephant' | 'pajamas' V -> 'shot' P -> 'in'
When a rule is found that matches the sequence of tokens on the right, it is reduced to the symbol on the left. When none of them match, another token is shifted onto the stack (read from the input). By successively applying the rules, the sequence of tokens (words) is reduced to a tree structure (representing the sentence).
Formal parsers for natural language do get complicated very quickly, though — particularly when words can have multiple interpretations (is "fish" a verb or a noun?)
I think that's what you're trying to do. Once you've parsed your input to get the structure, you can then do useful things with it: filter out adjectives, find the object in the current context the noun refers to, and so on.
Definitely have a look at nltk, which has functions for tokenizing (tokenising? ) sentences, building parse trees — you could even use WordNet if you really wanted to...! The nltk book is an excellent read, and discusses all this stuff (such as different parser strategies).
Also check out Link Grammar (I think the project moved to AbiWord?) — although that might be overkill
Does that help?
Oh, and on an unrelated note: try ipython, if you haven't already! It makes Python coding rather more wonderful (autocomplete, persistent command history, better copy/paste, can reload modules without restarting the interpreter...)
Edit: can't preposition-noun pairs (prepositional phrases?) be considered arguments, and verbs as functions? (Only insofar as your analysis is concerned.)
Last edited by blob8108 (2012-11-09 11:43:16)
Offline
Ok, now who has to do the digesting?
I'm aware natural language is a very slippery concept, especially when your natural language is as unnatural as English (I believe Latin is more coder-friendly, but I'm not gonna play interactive fiction in Latin). So I picked an easy subset to consider. All my commands should be of the above mentioned RegEx "verb object? (preposition, noun)*", except Regex only works for strings. Though this reduces the variety of possible inputs, it does provide a simple framework to understand the parsed code, too: verb + object + list of prepositional phrases (preposition-noun-tuples). Easy.
Just in case you didn't get it, try out this Hitchhikers' Guide-based game, which is exactly what I'm trying to replicate. Commands should go like this:
open drawer take out sock put on sock -- special case! it basically means "put sock on", but with word-order reversed put on shoe open door go to store buy milk ...
All the libraries you mentioned seem like overkill, especially considering the limited subset of inputs I will be considering. Also, I want to learn some parsing tricks along the way, not just tap into a library. To be honest, I haven't really given myself any formal computer science training, because though it is infinitely cooler and more satisfying, at the end of the day, you can't do it off Google on your own. That's gotta wait till college, I guess. Thanks for the help, though, I appreciate it.
Prepositional phrases are in fact the arguments, and the verbs are functions, in my above analogy of Swedish cheese. The object of the sentence corresponds to an instance of a class. Of course, I'll probably have a dictionary of lambdas or something instead of a separate function for each verb, because I'm planning to store the scene as an XML document (like ZIL for Zork). The theory remains the same, though. Probably a tad less sensible.
P.S. IPython looks cool, but I'm content with my Xcode-Terminal-Finder workflow which works like a dream on OSX Lion via full-screen apps + 4-finger-swiping.
Last edited by Hardmath123 (2012-11-09 12:11:35)
Offline
nouns = set(['i', 'you', 'table', 'state', 'monster']) # Lowercase so case doesn't matter in messages. prepositions = set(['into']) verbs = set(['kick', 'attack']) # Might need to figure out past and future tense. punctuation = '.?!' while True: user_input = raw_input("Say something: ").lower() user_input = user_input.split(" ") actions = [] action = {'subject' : None, 'verb' : None, 'preps' : None, 'object' : None} i = 0 finished = False while i < len(user_input): word = user_input[i] if word[-1] in punctuation: finished = True word = word[:-1] print word if word in nouns: if i == 0: action['subject'] = word else: action['object'] = word elif word in verbs: action['verb'] = word elif word in prepositions: action['preps'] = (word, user_input[i + 1]) i += 1 # Because we've gone through two words of input. i += 1 if finished: if action['subject'] == None: # The "invisible 'you'" for commands. action['subject'] = 'you' response = [] if action['subject'] == 'i': response.append('You') elif action['subject'] == 'you': response.append('I') response.append('will') response.append(action['verb']) response.append(action['object']) if action['preps'] != None: response.append(action['preps'][0]) response.append(action['preps'][1]) print response print ' '.join(response) + '.' actions.append(action)
It bugs me when I'm writing a large post that takes an hour and someone else decides to post while I'm writing it.
Anyways, hopefully the above is what you are looking for.
Edit: Forgot to mention that the responses are a little terrible, but hopefully you don't need anything to respond other than through an action on the screen.
Last edited by Magnie (2012-11-09 12:24:44)
Offline
I'm not sure what that does, so I'm gonna just plug it into Python and see what happens...
EDIT: Didn't scroll down, sorry (stupid new Lion scrollbar fits into the background perfectly). I see now. I'll make sense of it tomorrow.
Last edited by Hardmath123 (2012-11-09 12:20:41)
Offline
Hardmath123 wrote:
I'm not sure what that does, so I'm gonna just plug it into Python and see what happens...
EDIT: Didn't scroll down, sorry (stupid new Lion scrollbar fits into the background perfectly). I see now. I'll make sense of it tomorrow.
Basically it goes through every word in a sentence and provides a response.
Here's a "module" version of it since you probably don't need the response part:
nouns = set(['i', 'you', 'table', 'state', 'monster']) # Lowercase so case doesn't matter in messages. prepositions = set(['into']) verbs = set(['kick', 'attack']) # Might need to figure out past and future tense. punctuation = '.?!' def get_actions(user_input): user_input = user_input.lower() # Make everything lowercase. user_input = user_input.split(" ") # Divide all the words. actions = [] action = {'subject' : None, 'verb' : None, 'preps' : None, 'object' : None} i = 0 finished = False while i < len(user_input): word = user_input[i] if word[-1] in punctuation: finished = True word = word[:-1] print word if word in nouns: if i == 0: action['subject'] = word else: action['object'] = word elif word in verbs: action['verb'] = word elif word in prepositions: action['preps'] = (word, user_input[i + 1]) i += 1 # Because we've gone through two words of input. i += 1 if finished: actions.append(action) return actions if __name__ == '__main__': while True: user_input = raw_input("Say something: ") actions = get_actions(user_input) for action in actions: if action['subject'] == None: # The "invisible 'you'" for commands. action['subject'] = 'you' response = [] if action['subject'] == 'i': response.append('You') elif action['subject'] == 'you': response.append('I') response.append('will') response.append(action['verb']) response.append(action['object']) if action['preps'] != None: response.append(action['preps'][0]) response.append(action['preps'][1]) print response print ' '.join(response) + '.'
get_actions() will find the subject, verb, object, and preposition of a sentence.
Offline
Hardmath123 wrote:
All my commands should be of the above mentioned RegEx "verb object? (preposition, noun)*", except Regex only works for strings. Though this reduces the variety of possible inputs, it does provide a simple framework to understand the parsed code, too: verb + object + list of prepositional phrases (preposition-noun-tuples). Easy.
Though "regex but for words" sounds just like a formal grammar parser...
All the libraries you mentioned seem like overkill, especially considering the limited subset of inputs I will be considering.
Link grammar probably is — I'm not convinced nltk wouldn't be useful. Is it not even worth using nltk's tokenize function?
Also, I want to learn some parsing tricks along the way, not just tap into a library.
Yeah, I can understand that
Prepositional phrases are in fact the arguments, and the verbs are functions, in my above analogy of Swedish cheese. The object of the sentence corresponds to an instance of a class. Of course, I'll probably have a dictionary of lambdas or something instead of a separate function for each verb, because I'm planning to store the scene as an XML document (like ZIL for Zork). The theory remains the same, though. Probably a tad less sensible.
I'm not sure I understand what you want. Something like this? (Not that I mean to write your code for you; just trying to clarify! )
PREPOSITIONS = ['in', 'on', 'with', 'to', 'into'] def parse(words): verb = words.pop(0) object = None pps = [] phrase = [] for word in words: if word in PREPOSITIONS: if phrase: if object is None: object = phrase else: pps.append(phrase) phrase = [] phrase.append(word) if phrase: if object is None: object = phrase else: pps.append(phrase) return (verb, object, pps) >>> parse("hit my ball into the gutter with a stick".split(" ")) ('hit', ['my', 'ball'], [['into', 'the', 'gutter'], ['with', 'a', 'stick']])
P.S. IPython looks cool, but I'm content with my Xcode-Terminal-Finder workflow which works like a dream on OSX Lion via full-screen apps + 4-finger-swiping.
I do something similar with Alt+Tab But I'm just running `ipython` in Terminal instead of `python`. Seriously, it's much more lovely. (coloured exception stack traces, even! ...)
Last edited by blob8108 (2012-11-09 12:29:15)
Offline
Magnie wrote:
It bugs me when I'm writing a large post that takes an hour and someone else decides to post while I'm writing it.
This. -_-
Last edited by blob8108 (2012-11-09 12:31:48)
Offline
blob8108 wrote:
Hardmath123 wrote:
All my commands should be of the above mentioned RegEx "verb object? (preposition, noun)*", except Regex only works for strings. Though this reduces the variety of possible inputs, it does provide a simple framework to understand the parsed code, too: verb + object + list of prepositional phrases (preposition-noun-tuples). Easy.
Though "regex but for words" sounds just like a formal grammar parser...
All the libraries you mentioned seem like overkill, especially considering the limited subset of inputs I will be considering.
Link grammar probably is — I'm not convinced nltk wouldn't be useful. Is it not even worth using nltk's tokenize function?
Also, I want to learn some parsing tricks along the way, not just tap into a library.
Yeah, I can understand that
Prepositional phrases are in fact the arguments, and the verbs are functions, in my above analogy of Swedish cheese. The object of the sentence corresponds to an instance of a class. Of course, I'll probably have a dictionary of lambdas or something instead of a separate function for each verb, because I'm planning to store the scene as an XML document (like ZIL for Zork). The theory remains the same, though. Probably a tad less sensible.
I'm not sure I understand what you want. Something like this? (Not that I mean to write your code for you; just trying to clarify! )
Code:
PREPOSITIONS = ['in', 'on', 'with', 'to', 'into'] def parse(words): verb = words.pop(0) object = None pps = [] phrase = [] for word in words: if word in PREPOSITIONS: if phrase: if object is None: object = phrase else: pps.append(phrase) phrase = [] phrase.append(word) if phrase: if object is None: object = phrase else: pps.append(phrase) return (verb, object, pps) >>> parse("hit my ball into the gutter with a stick".split(" ")) ('hit', ['my', 'ball'], [['into', 'the', 'gutter'], ['with', 'a', 'stick']])P.S. IPython looks cool, but I'm content with my Xcode-Terminal-Finder workflow which works like a dream on OSX Lion via full-screen apps + 4-finger-swiping.
I do something similar with Alt+Tab But I'm just running `ipython` in Terminal instead of `python`. Seriously, it's much more lovely. (colored exception stack traces, even! ...)
That is practically what I need, actually, except I'll need to modify a lot of it for error-checking and special cases. Will yours handle "put sock on" and "put on sock"? I don't think it would. Also, it seems to keep trying to add an object; but what if there is not object? ("go to field" or "jump up")
Offline
Hardmath123 wrote:
That is practically what I need, actually, except I'll need to modify a lot of it for error-checking and special cases.
Good!
Will yours handle "put sock on" and "put on sock"? I don't think it would.
Sort of... I'm sure "put on" is some other grammatical construct... is that still a preposition?
Maybe you have to special-case "on".
Also, it seems to keep trying to add an object; but what if there is not object? ("go to field" or "jump up")
Good point! Fixed.
PREPOSITIONS = ['in', 'on', 'with', 'to', 'into'] def parse(words): verb = words.pop(0) object = None pps = [] phrase = [] def reset_phrase(phrase=phrase, object=object, pps=pps): # nonlocals hack pass for word in words: if word in PREPOSITIONS: if phrase: if object is None: object = phrase else: pps.append(phrase) phrase = [] phrase.append(word) if phrase: if phrase[0] in PREPOSITIONS: pps.append(phrase) else: object = phrase return (verb, object, pps) >>> parse("hit my ball into the gutter with a stick".split(" ")) ('hit', ['my', 'ball'], [['into', 'the', 'gutter'], ['with', 'a', 'stick']]) >>> parse("go to field".split(" ")) ('go', None, [['to', 'field']])
Offline
Well, I just realized "put on sock" is actually slang for "put sock on foot". So ideally "put on sock" should throw an error but "put sock on" should ask "on what?" and you should say "foot".
Offline
Hardmath123 wrote:
Well, I just realized "put on sock" is actually slang for "put sock on foot". So ideally "put on sock" should throw an error but "put sock on" should ask "on what?" and you should say "foot".
In the former, "put on" is actually a phrasal verb (with an adverbial particle). The only real way to parse these is to have a list of them. You could then provide simple transformations for each phrasal verb to yield a normal sentence. Keep in mind, though, the particle is often moved to the end of the sentence when the object is light:
"put on the sock" vs.
"put it on"
Last edited by nXIII (2012-11-10 10:54:54)
Offline
Ah. That's annoying.
Offline
I know it's overkill, but I just wanted to point out that linkgrammar does work:
It seems to handle phrasal verbs, too:
In [9]: while 1: print p.parse_sent(raw_input("? "))[0].constituent_phrases_flat ...: ? put on sock [<S>, <VP: put>, <PRT: on>, <NP: sock>] ? put sock on [<S>, <VP: put>, <NP: sock>, <PRT: on>] ? put on the sock [<S>, <VP: put>, <PRT: on>, <NP: the, sock>] ? put it on [<S>, <VP: put>, <NP: it>, <PRT: on>] ? jump up [<S>, <VP: jump>, <PRT: up>] ? go to field [<S>, <VP: go>, <S>, <VP: to>, <VP: field>]
Extracting the prepositional phrases or whatever should then be relatively simple. And installation was just `brew install link-grammar`, `sudo pip install pylinkgrammar`. Just a thought
Offline
I'm a minimalist, that doesn't really appeal. I'll make that my plan B.
Offline