I attempting what I thought would be a simple exercise, but unless I’m missing a trick, it seems anything but simple.
Im attempting to clean up user input into a form before saving it. The particular problem I have is with hyphenated town names. For example, take Bourton-on-the-Water. Assume the user has Caps lock on or puts spaces next to the hyphens of any other screw up that might come to mind. How do I, within reason, turn it into what it’s meant to be?
You can use trim() to remove whitespace (or other characters) from the beginning and end of a string. You can also use explode() to break strings into parts by a specified character and then recreate your string as you like.
I think the only way you can really accomplish this is by improving the way the user inputs their data.
For example use a postcode lookup system that enters an address based on what they type.
Or have a autocomplete from a predefined list of towns (similar to how Facebook shows towns).
To consider every possible permutation of Bourton On The Water / Bourton-On-The-Water etc... is pretty much impossible.
Related
I would like to create a regex which validates a name of a person. These should be allowed:
Letters (uppercase and lowercase)
-
spaces
This is pretty easy to create a regex for. The problem is that some people also use special characters in their names. For example, assume a user named gûnther or François. There are a lot of characters like û and ç available and it's hard to list all of these.
Is there an easy way to check for correct human names?
Is there an easy way to check for correct human names?
This has been discussed several times. I'm fairly certain that the only thing that people can agree on is that in order to exist a name cannot be a empty string, thus:
^.+$
(Yes, I am aware that this is probably not what OP is looking for. I'm just summarizing earlier Q&As.)
/^\pL[\pL '-]*\z/ should do the trick
The short answer is no, there is no easy way. You have touched on the biggest issue. There are so many special cases of accents and extra things hanging of letters that it will become a mess to deal with. Additionally, the expression with break down to something like this
^[CAPITAL_LETERS][ALL_LETERS_AND_SYMBOLS]*$
That is not that helpful because "Abcd" fits that and you have no way to know if someone is incorrectly entering info into the field or if it was a crazy Hollywood parent that actually named their kid that or something like Sandwich or Umbrella.
^.+$
Checked #jensgram answer, but that regex only accepts all strings, so it doesn't solve problem, because string needs to be name, in this case it can be anything.
^[A-Z][a-z]+$
My regex only accepts string where first char is uppercase and following chars are letters in lowercase. Also looking through other answers, this seems to be shortest regex and also simpliest.
I don't know exactly what you are trying to do (validate user name input?) but basically I would keep it simple - fail the validation if the text contains numbers. And even that's probably pretty shaky.
I had the same problem. First I came up with something like
preg_match("/^[a-zA-Z]{1,}([\s-]*[a-zA-Z\s\'-]*)$/", $name))
but then realized that UTF-8 chars of countries like Sweden, China etc. for example Õ å would not be allowed which was important to my site since it's an international site and don't want to force users not being able to enter their real name.
I though it might be an easier solution instead of trying to figure out how to allow names like O'Malley and Brooks-Schneider and Õsmar (made that one up :) to rather catch chars that you don't want them to enter. For me it was basically to avoid xss JS code being entered. So I use the following regex to filter out all chars that might be harmful.
preg_match("/[~!##\$%\^&\*\(\)=\+\|\[\]\{\};\\:\",\.\<\>\?\/]+/", $name)
That way they can enter any name they want except chars that really aren't part of any name. Hope this might be useful.
Is it possible to display the strings that match a regular expression?
Example:
Take the expression /^AD\d{3}/
and display AD999
What I'm doing is validating a string that is pretty simple either containing all numbers, a few characters maybe, and maybe a '-'. I am validating a postal code on form submit against a database of all countries that use a postal code.
I could perform it in Javascript or PHP, if that makes any difference.
No. That sort of feature is not available.
You can try to implement it yourself, but I don't think that's the solution for you. Simply write the messages normally. Not everything must always be dynamic.
I like your way of thinking though.
It is possible. The developers of PEX figured it out.
Don't get your hopes up, I don't know of any javascript implementation.
There is one for javascript now: http://fent.github.io/randexp.js/.
I have understood your problem a little better from your additional comments.
Since your data is only postal codes, I suggest that it would possible to work in the other direction and store a picture in the database and automatically generate a regex from that.
For instance, UK postcodes look like AA?99? 9AA | AA?9A 9AA which is easily converted to a regex (using a regex!).
What is the best way to validate a string as not gibberish using PHP?
For example, if I get a string input from a user that must be at least 250 characters long, how can I tell whether they entered legitimate text (e.g. real words) or just gibberish to comply with the minimum characters (e.g. asdlfkjefksjlfkjldskfjelkef)?
I've thought about counting the number of words as one option, but the user could still space out their gibberish (e.g. asdlf kjef ksjlf kjl dskfje lkef), so it needs another kind of check on top of that.
Is there any way to check that at least half of a string contains real dictionary words, or something to that effect?
What is the best solution to this problem?
Thanks.
You cannot do that properly because Colorless green ideas sleep furiously.
You could try a Bloom filter
You can walk through your dictionary and delete all dictionary words from user input and then check the length of the rest
You could look at Markov Chains. Simply put the idea is this algorithm determines whether sequences of characters look like they belong together. It won't necessarily tell you it's not gibberish, but it should catch out things like "ksjhglah etc".
See Markov text generators
I'm trying to parse a plain text document in PHP but have no idea how to do it correctly.
I want to separate each word, assign them an ID and save the result in JSON format.
Sample text:
"Hello, how are you (today)"
This is what im doing at the moment:
$document_array = explode(' ', $document_text);
json_encode($document_array);
The resulting JSON is
[["Hello,"],["how"],["are"],["you"],["(today)"]]
How do I ensure that spaces are kept in-place and that symbols are not included along with the words...
[["Hello"],[", "],["how"],[" "],["are"],[" "],["you"],[" ("],["today"],[")"]]
I’m sure some sort of regex is required... but have no idea what kind of pattern to apply to deal with all cases... Any suggestions guys?
This is actually a really complex problem, and one that's subject to a fair amount of academic reaserch. It sounds so simple (just split on whitespace! with maybe a few rules for punctuation...) but you quickly run into issues. Is "didn't" one word or two? What about hyphenated words? Some might be one word, some might be two. What about multiple successive punctuation characters? Possessives versus quotes? etc etc. Even determining the end of a sentence is non-trivial. (It's just a full stop right?!)
This problem is one of tokenisation and a topic that search engines take very seriously. To be honest you should really look at finding a tokeniser in your language of choice.
Maybe this:?
array_filter(preg_split('/\b/', $document_text))
the 'array_filter', removes the empty values at the first and/or last index of the resulting array, which will appear if your string start or ends with a word boundary (\b see: http://php.net/manual/en/regexp.reference.escape.php)
I am building a string to detect whether filename makes sense or if they are completely random with PHP. I'm using regular expressions.
A valid filename = sample-image-25.jpg
A random filename = 46347sdga467234626.jpg
I want to check if the filename makes sense or not, if not, I want to alert the user to fix the filename before continuing.
Any help?
I'm not really sure that's possible because I'm not sure it's possible to define "random" in a way the computer will understand sufficiently well.
"umiarkowany" looks random, but it's a perfectly valid word I pulled off the Polish Wikipedia page for South Korea.
My advice is to think more deeply about why this design detail is important, and look for a more feasible solution to the underlying problem.
You need way to much work on that. You should make an huge array of most-used-word (like a dictionary) and check if most of the work inside the file (maybe separated by - or _) are there and it will have huge bugs.
Basically you will need of
explode()
implode()
array_search() or in_array()
Take the string and look for a piece glue like "_" or "-" with preg_match(); if there are some, explode the string into an array and compare that array with the dictionary array.
Or, since almost every words has alternate vowel and consonants you could make an huge script that checks whatever most of the words inside the file name are considered "not-random" generated. But the problem will be the same: why do you need of that? Check for a more flexible solution.
Notice:
Consider that even a simple-and-friendly-file.png could be the result of a string generator.
Good luck with that.