PHP - quick regular expression question - php

so I am trying to match word in a wall of text and return few words before and after the match. Everything is working, but I would like to ask if there is any way to modify it so it will look for similar words. Hmm, let me show you an example:
preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*(pripravená)(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);
This code returns a match, but I would like it to modify it so
preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*(pripravena)(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);
would also return a match. Its slovak language and I tried with range of unicode characters and also with \p{Sk} (and few others) but to no avail. Maybe I just put it in the wrong place, I dont know...
Is something like this possible?
Any help is appreciated

I don't know if there is a "ignore accent" switch. But you could replace your search query with something like:
$query = 'pripravená';
$query = preg_replace(
array('=[áàâa]=i','=[óòôo]=i','=[úùûu]=i'),
array( '[áàâa]' , '[óòôo]' , '[úùûu]' ),
$query
);
preg_match_all('/(?:\b(\w+\s+)\{1,5})?.*('.$query.')(?:(\s+){1,2}\b.{1,10})?/u', $item, $res[$file]);
That would convert your 'pripravená' query into 'pripraven[áàâa]'.

You could use strtr() to strip out the accents: See the PHP manual page for a good example - http://php.net/manual/en/function.strtr.php
$addr = strtr($addr, "äåö", "aao");
You'd still need to specify all the relevant characters, but it would be easier than using a regex to do it.

(pripraven[áa]) or (pripravena\p{M}*) or, more likely, some combination of these approaches.
I don't know of any other, more concise, way of specifying "all Latin-1 vowels that are similar to 'a' in my current locale".

Related

PHP Replace all characters with a symbol

I am trying to make an account generator with censured passwords, and I don't want to replace all characters with just 10 *'s. I want it to be like this:
if the password is 15 characters long, it will be replaced with 15 *'s. I tried to use this:
$censpass = preg_replace('/[a-zA-Z0-9\']/', '*', $accounts[$i]['password']);
but as you might know, that doesn't work for !'s. How can I use preg_replace with every single character in PHP?
If someone doesn't understand:
I want this: "password123!"
to be replaced with this: "************" with the accurate length using preg_replace
If this exists somewhere else, please link it below, I tried to find this but I could only find how to replace some characters, like numbers only
Thank you :)
For your goal I'd use a different approach, such as:
$encpass = str_pad('', strlen($accounts[$i]['password']), '*');
In fact, there is no need to use a regular expression (which is slow and resource consuming) just to generate a string the same length as another one.
Anyway, if you still want to use your solution, the correct regexp for your use case is simply a . such as:
$censpass = preg_replace('/./', '*', $accounts[$i]['password']);
Have a look here: http://php.net/manual/en/regexp.reference.dot.php

Using preg_match_all to filter out strings containing this but not this

im having an issue with preg_match_all. I have this string:
$product_req = "ACTIVE-6,CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-9";
I need to get the numbers preceded by "ACTIVE-" but not by "CATEGORY-ACTIVE-", so in this case the result should be 6,9. I used the statement below:
preg_match_all("/ACTIVE-(\d+)/", $product_req, $this_act);
However this will return all the numbers because all of them are in fact preceded by "ACTIVE-" but thats not what i meant because i need to leave out those preceded by "CATEGORY-ACTIVE-". How can i configure preg_match_all to do it? Or maybe there is some other function that can do the job?
EDIT:
I tried this:
preg_match_all("/CATEGORY-ACTIVE-(\d+)/", $product_req, $this_cat_act);
preg_match_all("/ACTIVE-(\d+)/", $product_req, $this_act);
$act_cat = str_replace($this_cat_act[1],"",$this_act[1]);
it kinda works, but i guess there is a better and cleaner way to do it. Besides the output is kinda weird too.
Thank you.

How do I strip out in PHP everything but printing characters?

I am working with this daily data feed. To my surprise, one the fields didn't look right after it was in MySQL. (I have no control over who provides the feed.)
So I did a mysqldump and discovered the zip code and the city for this record contained a non-printing char. It displayed it in 'vi' as this:
<200e>
I'm working in PHP and I parse this data and put it into the MySQL database. I have used the trim function on this, but that doesn't get rid of it. The problem is, if you do a query on a zipcode in the MySQL database, it doesn't find the record with the non-printing character.
I'd like the clean this up before it's put into the MySQL database.
What can I do in PHP? At first I thought regular expression to only allow a-z,A-Z, and 0-9, but that's not good for addresses. Addresses use periods, commas, hyphens and perhaps other things I'm not thinking of at the moment.
What's the best approach? I don't know what it's called to define it exactly other than printing characters should only be allowed. Is there another PHP function like trim that does this job? Or regular expression? If so, I'd like an example. Thanks!
I have looked into using the PHP function, and saw this posted at PHP.NET:
<?php
$a = "\tcafé\n";
//This will remove the tab and the line break
echo filter_var($a, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_LOW);
//This will remove the é.
echo filter_var($a, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH);
?>
While using FILTER_FLAG_STRIP_HIGH does indeed strip out the <200e> I mentioned seen in 'vi', I'm concerned that it would strip out the letter's accent in a name such as André.
Maybe a regular expression is the solution?
You can use PHP filters: http://www.php.net/manual/en/function.filter-var.php
I would recommend on using the FILTER_SANITIZE_STRING filter, or anything that fits what you need.
I think you could use this little regex replace:
preg_replace( '/[^[:print:]]+/', '', $your_value);
It basically strip out all non-printing characters from $your_value
I tried this:
<?php
$string = "\tabcde éç ÉäÄéöÖüÜß.,!-\n";
$string = preg_replace('/[^a-z0-9\!\.\, \-éâëïüÿçêîôûéäöüß]/iu', '', $string);
print "[$string]";
It gave:
[abcde éç ÉäÄéöÖüÜß.,!-]
Add all the special characters, you need into the regexp.
If you work in English and do not need to support unicode characters, then allow just [\x20-\x7E]
...and remove all others:
$s = preg_replace('/[^\x20-\x7E]+/', '', $s);

There has to be a better regex

I'm writing a small CMS and I'm trying to turn a title into a URL slug with dashes. I know I need to do a couple of things and I've got the whole thing work, but I just don't like it. The problem seems to be that if there are any special characters at the end, I'd need to remove them before it goes into the database. The only way I could figure out doing this was to do 2 preg_replace's in one statement. So it looks something like this:
preg_replace("/\-$/","",preg_replace('/[^a-z0-9]+/i', "-", strtolower($title)));
and it and turn this: (this is a title!!!)))**that is (strange))
into this: this-is-a-title-that-is-strange
But this expression just looks like ass. There has to be a better way of coding this, or something out there, I just don't know it. Any help would be greatly appreciated
You can make just one call to preg-replace with array inputs as:
preg_replace( array('/[^a-z0-9]+/','/^-|-$/'), // from array
array('-',''), // to array
strtolower($title));
Note that your existing code retains leading - if any. The code above gets rid of that.
One option, which still requires two replacements but takes care of both the start and end dashes in one pass, is:
preg_replace('/[^a-z0-9]/', '',
preg_replace('/([a-z0-9])[^a-z0-9]+([a-z0-9])/', '$1-$2',
strtolower($title)));
There is also the alternative of:
implode('-',
preg_split('/[^a-z0-9]/',
strtolower($title),
PREG_SPLIT_NO_EMPTY));
Use trim.
trim(preg_replace('/[^a-z0-9]+/i', "-", strtolower($title)), '-')

Whitelist in php

I have an input for users where they are supposed to enter their phone number. The problem is that some people write their phone number with hyphens and spaces in them. I want to put the input trough a filter to remove such things and store only digits in my database.
I figured that I could do some str_replace() for the whitespaces and special chars.
However I think that a better approach would be to pick out just the digits instead of removing everything else. I think that I have heard the term "whitelisting" about this.
Could you please point me in the direction of solving this in PHP?
Example: I want the input "0333 452-123-4" to result in "03334521234"
Thanks!
This is a non-trivial problem because there are lots of colloquialisms and regional differences. Please refer to What is the best way for converting phone numbers into international format (E.164) using Java? It's Java but the same rules apply.
I would say that unless you need something more fully-featured, keep it simple. Create a list of valid regular expressions and check the input against each until you find a match.
If you want it really simple, simply remove non-digits:
$phone = preg_replace('![^\d]+!', '', $phone);
By the way, just picking out the digits is, by definition, the same as removing everything else. If you mean something different you may want to rephrase that.
$number = filter_var(str_replace(array("+","-"), '', $number), FILTER_SANITIZE_NUMBER_INT);
Filter_Var removes everything but pluses and minuses, and str_replace gets rid of those.
or you could use preg_replace
$number = preg_replace('/[^0-9]/', '', $number);
You could do it two ways. Iterate through each index in the string, and run is_numeric() on it, or you could use a regular expression on the string.
On the client side I do recommand using some formating that you design when creating a form. This is good for zip or telephone fields. Take a look at this jquery plugin for a reference. It will much easy later on the server side.

Categories