Cherry pick an australian address from a page of text - php

I am trying to parse a prose paragraph for anything that might resemble an address. I have a database of addresses I am matching against and these are the only addresses I am interested in. I'm using a lamp server but technology specific answers aren't what I require right now. More of a question of how.
Can anyone provide ideas? Perhaps Regex? or perhaps I should use a database of cities/states etc?
Thanks.

It looks like this question hasn't gotten answered because it's entirely unclear what the problem parameters are. If you want a more specific answer to a problem, please describe your problem more fully.
In general I would suggest aproaching a problem like this using some piece of known data ... small collection of words or formats that delieniate and address, then match on the context of those words to see if they really flesh out to a full address.

Related

Can reCaptcha image be restricted to numbers - i.e. no text?

I know there is a message saying this cannot be done. However, I don't know which year the question was asked and reCaptcha may have been modified since then. Sometimes a numeric image is displayed already. I'd like to make all images similar.
You can look at the following URLs. This might help you
http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-just-got-easier-but-only-if.html
http://code.google.com/p/recaptcha/wiki/HowToSetUpRecaptcha

Search as sentence in phpbb

I have a PHPBB. and here it's working as word wise. I have tried magento tutorial, it always separates the words, thus searching for "magento" and "tutorial". is there a possibility to search whole sentence as written?
I need to do the sentence search in my web site. If any one know this, Please help me .
Is there is any settings ? or any code..
Please help me
If you Google this particular problem, you will see one of the first results (https://www.phpbb.com/community/viewtopic.php?p=12772555) that contains the answer to your question. If indeed I am understanding your question, as "word wise" in regards to PHPBB is not very descriptive.

How to extract addresses in PHP, from a source with no standard format of writing addresses?

I have a bunch of messages (from twitter) that include addresses. They are in various kinds (as many as you could imagine a random sampling of people enter an address. The city location is always known so they normally just put a road name and number/area)
Is there any library out there to extract these? I've tried looking but found nothing.
If no, any suggestions as to how I do this? At the moment I am just extracting things like [previous word + [rd/ave/street/lane/blvd]] but it isn't that accurate.
Any ideas?
Thanks
I know of no library that does this.. but a crazy idea came to mind while reading your question.
Use the google maps geocoding api to find long and lat for your address..
then use the reverse geocoding api to find the address from your lat and long since it will be neatly formatted in a json object.
Quite messy but it is the best i can come up with. (Has the upside that you then already have the coordinates of your address :)

Check for misspelt words using PHP

I am using this code to create an instant search for my site...
http://woorkup.com/2010/09/13/how-to-create-your-own-instant-search/
Some of the phrases in our database our very complex and can be easily misspelt so on top of this I wanted to use spelling suggestions.
Does anyone know of any ways to offer correct spellings based on a string provided?
Any help would be greatly appreciated.
Yes there is a jQuery plugin called After the Deadline.
If someone searches for a phrase, doesn't click any of the results, and then researches with a new similar phrase (check out levenshtein()) and does click a result, write to your database the original phrase and the new phrase.
Record each time this happens. If the phrase is already matched, increment a counter for that phrase.
Then, if someone searches for a phrase that matches one of your possibly incorrect phrases (perhaps have a threshold using your counter), you can display a Did you mean to search for...? as well as the results (if any) for the incorrect phrase.
This isn't a spell check per se, but I think it would be useful to pick up on common mistakes. Unfortunately though, you probably don't have as many people to help you build an index like Google's Did you mean?
Peter Norvig has written (and explained) a fairly basic spelling corrector; which makes for a very interesting read. It's in Python, but his explanations are invaluable (He does work for Google and this is a very bare bones representation of the Google did you mean algorithm).

How to detect nonsensical text in PHP?

I have comments enabled on my site and I require users to enter at least 30 characters to publish their comments (Just to get some value because they usualy just submitted "I like it")
But some users now use simple technique to overcome this and enter e.g.:
"I like it. asdsdf dfdsfsdf tt erretrt re"
As you can see the rest of the text is nonsense. Is there a way (algorithm) how to filter these comments out in PHP ?
Get a dictionary of English words from the net. Check the post has a certain % (maybe 50%? maybe 70%?) of words that are in the dictionary. You can't look for 100%, or names and technical jargon will not be found.
users will get around this by entering.
I like it ....................................................
So then add logic to parse out punctuation.
Then users will get around it with
I like it. the the the the the the the the
Then you will need to parse it for proper English grammer
Then no one will be able to post on your site becuase it has too many rules.
Better suggestion: Add comment moderation. Dumb posts get downvoted and go away. Good posts stay.
Check out the Akismet PHP5 class.
$WordPressAPIKey = 'KEYHERE';
$MyBlogURL = 'http://www.example.com/blog/';
$akismet = new Akismet($MyBlogURL ,$WordPressAPIKey);
$akismet->setCommentAuthor($name);
$akismet->setCommentAuthorEmail($email);
$akismet->setCommentAuthorURL($url);
$akismet->setCommentContent($comment);
$akismet->setPermalink('http://www.example.com/blog/alex/someurl/');
if($akismet->isCommentSpam()) {}
You can use a naive bayesian filter for this. http://www.paulgraham.com/better.html
There are probably existing libraries for this kind of thing. Check out spam assassin.
I'd do a simple check on consecutive consonants or vowels. If there are more than four of any in a row, than there is a high probability of nonsense. Furthermore, check for more than two repetitions of the same character. When looking at some nonsense text, I'm sure you'll find some pragmatic reciepes ;-)
Personally, I would say there's not much you can do about it. Even if you had a dictionary and parser, what if I were to leave a comment: "I like it. As do I like your car." Depending on what they're leaving a comment for, that could be complete nonsense. Best I can say is have an edit available for each comment so that you or a mod or whomever can edit it. Sorry that this isn't of any help.
I had this same issue when trying to create password restrictions. Words couldn't be used, so we needed to use a dictionary, but there is never a comprehensive dictionary. And the biggest thing was eliminating l33t speak. :)
Unfortunately not, your best bet is to modify something like this: Get Spelling Corrections From Google. When messages are close to the 80 character limit, you could look up each word individually and if it doesn't have a direct hit, boot out the input.

Categories