simple regex for italian words

simple regex for italian words - php

I can't understand a simple regex that I need for a control over a preg_match statment:
-I need that every words with even white space between, and accent and apostrophe are allowed
so something like: sjsjsjjsjsjs òòòò èèèèè ddddd ''' eerfk jefrkj sdc should be accepted
so i write something like: [a-zA-Z\xE0\xE8\xE9\xF9\xF2\xEC\x27\s]*
that take everything that is letters and some special HEX code for accent and apostrophe, but i can't understand how to concatenate the sentence:
[^\r\n]
I'd like to reject anything if there is an end of line or a return statement. The puntaction too but it seem to be allreafy solved with my regex
so something like:
adjnasdnjadsija adokasmdoasdmoa admoadsoasodoas END
sddaadsasd òòò
should be accepted until the words END
Is it the right code? i made several test but no result!
I test my regex over http://regex101.com/

Set the locale appropriately:
setlocale(LC_ALL,"it_IT");
Now you can use a much simpler regex:
/[\w\s]*/
This is because \w is locale-aware ^_^

Related

preg_replace add links around words starting with some-part-of-a-word

I've got this preg_replace in php which almost correctly replaces every word starting with 'exploit' into links:
preg_replace('#[\b]?(exploit([^ ]*))[\b]?#', '<a>$1</a>', 'My exploits are exploitable.');
I get this:
My <a>exploits</a> are <a>exploitable.</a>
Which is half wrong, the fullstop should not be linked on the second word. I know I need to replace the above part [^ ] to something like [^\b] but it doesn't work.
I know I can always do i.e [^ .] but it would only work on words ending with space and fullstops, not commas for example.

\b(exploit[a-zA-Z]*) should do the trick.
Aside: This website is pretty handy for trying out preg_replace: http://www.fullonrobotchubby.co.uk/random/preg_tester/preg_tester.php

\bexploit(\w*)\b - more simple....
have fun
regexrDemo
EDIT
as commented if you want only letters:
/\bexploit([a-z]*)\b/i
regexrDemo only letters

If you want to deal with unicode, you could do:
preg_replace('#\b(exploit\pL*)#u', '<a>$1</a>', 'My exploits are exploitable.');
Where \pL stands for any letter.

Building a regex expression for PHP

I am stuck trying to create a regex that will allow for letters, numbers, and the following chars: _ - ! ? . ,
Here is what I have so far:
/^[-\'a-zA-Z0-9_!\?,.\s]+$/ //not escaping the ?
and this version too:
/^[-\'a-zA-Z0-9_!\?,.\s]+$/ //attempting to escape the ?
Neither of these seem to be able to match the following:
"Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?"
Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.
Thanks!
UPDATE:
Thanks to Lix's advice, this is what I have so far:
/^[-\'a-zA-Z0-9_!\?,\.\s]+$/
However, it's still not working??
UPDATE2
Ok, based on input this is what's happening.
User inputs string, then I run the string through following functions:
$comment = preg_replace('/\s+/', '',
htmlspecialchars(strip_tags(trim($user_comment_orig))));
So in the end, user input is just a long string of chars without any spaces. Then that string of chars is run using:
preg_match("#^[-_!?.,a-zA-Z0-9]+$#",$comment)
What could possibly be causing trouble here?
FINAL UPDATE:
Ended up using this regex:
"#[-'A-Z0-9_?!,.]+#i"
Thanks all! lol, ya'll are going to kill me once you find out where my mistake was!
Ok, so I had this piece of code:
if(!preg_match($pattern,$comment) || strlen($comment) < 2 || strlen($comment) > 60){
GEEZ!!! I never bothered to look at the strlen part of the code. Of course it was going to fail every time...I only allowed 60 chars!!!!

When in doubt, it's always safe to escape non alphanumeric characters in a class for matching, so the following is fine:
/^[\-\'a-zA-Z0-9\_\!\?\,\.\s]+$/
When run through a regular expression tester, this finds a match with your target just fine, so I would suggest you may have a problem elsewhere if that doesn't take care of everything.
I assume you're not including the quotes you used around the target when actually trying for a match? Since you didn't build double quote matching in...
Can somebody point out what I am doing wrong? I must point out that my script takes the user input (the paragraph in quotes in this case) and strips all white space so actual input has no white space.
in which case you don't need the \s if it's working correctly.

I got the following code to work as expected to (running php5):
<?php
$pattern = "#[-'A-Z0-9_?!,.\s]+#i";
$string = "Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?";
$results = array();
preg_match($pattern, $string, $results);
echo '<pre>';
print_r($results);
echo '</pre>';
?>
The output from print_r($results) was as following:
Array
(
[0] => Oh why, oh why is this regex not working! It's getting pretty frustrating? Frustrating - that is to say the least. Hey look, an underscore_ I wonder if it will match this time around?
)
Tested on http://writecodeonline.com/php/.

It's not necessary to escape most characters inside []. However, \s will not do what you want inside the expression. You have two options: either manually expand (/^[-\'a-zA-Z0-9_!?,. \t\n\r]+$/) or use alternation (/^(?:[-\'a-zA-Z0-9_!?,.]|\s)+$/).
Note that I left the \ before the ' because I'm assuming you're putting this in a PHP string and I wouldn't want to suggest a syntax error.

The only characters with a special meaning within a character class are:
the dash (since it can be used as a delimiter for ranges), except if it is used at the beginning (since in this case it is no part of any range),
the closing bracket,
the backslash.
In "pure regex parlance", your character class can be written as:
[-_!?.,a-zA-Z0-9\s]
Now, you need to escape whatever needs to be escaped according to your language and how strings are written. Given that this is PHP, you can take the above sample as is. Note that \s is interpreted in character classes as well, so this will match anything which is matched by \s outside of a character class.
While some manuals recommend using escapes for safety, knowing the general regex rules for character classes and applying them leads to shorter and easier to read results ;)

Regular expression to check characters in a string

I have a text to validate using regex,
Text field is allowed to have all the characters a-z and 0-9 except for alphabets (i,o,q).
I tried something like this but cannot get it to work '/[^(oOiIqQ)]/'

A simple way for exclusions like this is to use negative lookahead. State what you want:
/^(?:[a-z0-9])+\z/i
Then exclude the items you don't want:
/^(?:(?![ioq])[a-z0-9])+\z/i

You cannot use parenthesis in [ ... ].
You have to use something like '/[0-9a-hj-npr-zA-HJ-NPR-Z]/'
If you want to be sure your text only has those characters, use:
'/^[0-9a-hj-npr-zA-HJ-NPR-Z]+$/'
So you can match a string containing any number of those characters, and only those.

Maybe someting like this /^[0-9a-hj-npr-z A-HJ-NPR-X]+$/

I would assume a little change to your's and would try:
^[^oOiIqQ]+$

This might works: [a-hj-npr-z] Maybe you can add the flag i at the end of your regexp for case insensibility.
(yours will allow EVERY characters except those you specified)
if (preg_match('#^[0-9a-hj-npr-z]+$#i', $string)) {
//string OK
}

There is a very simple solution for this which takes in consideration negative regexing (which makes the regex shorter and much readable)
[^ioqIOQ]+

Positive look ahead regex confusing

I'm building this regex with a positive look ahead in it. Basically it must select all text in the line up to last period that precedes a ":" and add a "|" to the end to delimit it. Some sample text below. I am testing this in gskinner and editpadpro which has full grep regex support apparently so if I could get the answers in that for I'd appreciate it.
The regex below works to a degree but I am unsure if it is correct. Also it falls down if the text contains brackets.
Finally I would like to add another ignore rule like the one that ignores but includes "Co." in the selection. This second ignore rule would ignore but include periods that have a single Capital letter before them. Sample text below too. Thanks for all the help.
^(?:[^|]+\|){3}(.*?)[^(?:Co)]\.(?=[^:]*?\:)
121| Ryan, T.N. |2001. |I like regex. But does it like me (2) 2: 615-631.
122| O' Toole, H.Y. |2004. |(Note on the regex). Pages 90-91 In: Ryan, A. & Toole, B.L. (Editors) Guide to the regex functionality in php. Timmy, Tommy& Stewie, Quohog. * Produced for Family Guy in Quohog.

I don't think I understand what you want to do. But this part [^(?:Co)] is definitely not correct.
With the square brackets you are creating a character class, because of the ^ it is a negated class. That means at this place you don't want to match one of those characters (?:Co), in other words it will match any other character than "?)(:Co".
Update:
I don't think its possible. How should I distinguish between L. Co. or something similar and the end of the sentence?
But I found another error in your regex. The last part (?=[^:]*?\:) should be (?=[^.]*?\:) if you want to match the last dot before the : with your expression it will match on the first dot.
See it here on Regexr

This seems to do what you want.
(.*\.)(?=[^:]*?:)
It quite simply matches all text up to the last full stop that occurs before the colon.

Regular Expressions find and replace

I am having problems with RegEx in PHP and can't seem to find the answer.
I have a string, which is 3 letters, all caps ie COS.
the letters will change but always be 3 chars long and in caps, it will also be in the center of another string, surrounded by commas.
I need a regEx to find 3 caps letter inside a string and cahnge them from COS to 'COS'
(im doing this to amend a sql insert string)
I can't seem to find the regEx unless i use spercifit letter but the letters will change.
I need something along the lines of
[A-z]{3} then replace with '[A-Z]' (I know this isnt anywere near correct, just shorthand)
Anyone any suggestions?
Cheers
EDIT:
Just wanted to add incase anyone comes accross this question at a later date:
the sql insert string (provided from an external source and ftp's to my server daily)
contained the 3 capital string twice, once with commas and once with out
so I had to also remove the double commas added from the first regEx
$sqlString = preg_replace('/([A-Z]{3})/', "'$1'", $isqlString);
$sqlString = preg_replace('/\'\'([A-Z]{3})\'\'/', "'$1'", $sqlStringt);
Thanks everyone

You were actually very close. You could use:
echo preg_replace('/([A-Z]{3})/', "'$1'", 'COS'); //will output 'COS'
For MySQL statements I would advise to use the function mysql_real_escape_string() though.

$string = preg_replace('/([A-Z]{3})/', "'$1'", $string);
http://php.net/manual/en/function.preg-replace.php

Assuming it's like you said, "three capital letters surrounded by commas, e.g.
Foo bar,COS,Foo Bar
You can use look-ahead and look-behinds and find the letters:
(?<=,)([A-Z]{3})(?=,)
Then a simple replace to surround with single quotes will be adequate:
'$1'
All together, Here's it working.

preg_replace('/(^|\b)([A-Z]{3})(\b|$)/', "'${2}'", $string);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

simple regex for italian words - php

Set the locale appropriately: setlocale(LC_ALL,"it_IT"); Now you can use a much simpler regex: /[\w\s]*/ This is because \w is locale-aware ^_^

Related

preg_replace add links around words starting with some-part-of-a-word

Building a regex expression for PHP

Regular expression to check characters in a string

Positive look ahead regex confusing

Regular Expressions find and replace

Categories

Resources