pregsplit - how to group together alphanumeric character and "_"? - php

I have the following expression in pregsplit:
$content = preg_split('/([\p{P}\p{S}])|\s/', $file, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
Now if the content of the input file was int somenumber;
It would split into:
int
somenumber
;
If it was int some_number; what I'd get is:
int
some
_
number
;
However, what I'd like is:
int
some_number
;
Is there a way to edit this expression to group together alphanumeric characters + the "_" ?

The _ is matched by \p{P} (punctuation property class). Restrict it with the (?!_) negative lookahead:
$content = preg_split('/((?!_)[\p{P}\p{S}])|\s/', 'int some_number;', -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
See the PHP demo and a regex demo.
With this (?!_)[\p{P}\p{S}], all punctuation and symbol characters with the exception of _ can be matched.

Related

regular express issue with 1 character string

I am allowing only alpha-numeric, _ & - values in string and removing all other characters. Its working fine but when string size 1 character (does not matter its alphabet or numeric or _ or -), I got empty value instead of single charter.
Here is sample code
$str = 1;
$str = preg_replace('/^[a-zA-Z0-9_-]$/', '', $str);
var_dump($str);
or
$str = 'a';
$str = preg_replace('/^[a-zA-Z0-9_-]$/', '', $str);
var_dump($str);
I have tested this multiple versions of PHP as well
You are removing any chars other than ASCII letters, digits, _ and - anywhere inside the string. You need to remove anchors and convert the positive character class into a negated one:
$str = preg_replace('/[^\w-]+/', '', $str);
See the PHP demo online and a regex demo.
Details
[^ - start of a negated character class
\w - a word char: letter, digit or _
- - a hyphen
] - end of the character class
+ - a quantifier: 1 or more repetitions.

Count of exact characters in string [duplicate]

This question already has answers here:
Count exact substring in a string in php
(3 answers)
Closed 3 years ago.
I'm trying to count the number of occurrences of a character in a string.
for example:
$string = "ab abc acd ab abd";
$chars = "ab";
How many times does $chars exactly appears in $string, and the right answer is 2 times, but with substr_count() it returns 3 times !!!
Is there any PHP function or Regex that return the right answer ?
with regex you can do the following:
$count = preg_match_all('/\bab\b/', $string);
it will count occurrencies of the word "ab". \b in the regular expression means position between a non-word character and a word character.
A "word" character is any letter or digit or the underscore character.
To what you have said in comments already, you are not trying to find an exact word since a word has specific boundaries. So what you are trying to do is something like this:
/(?:\A|[^H])HH(?:[^H]|\z)/g
preg_match_all('/(\A|[^H])HH([^H]|\z)/', $string, $matches);
or with question's example:
/(?:\A|[^a])ab(?:[^b]|\z)/g
preg_match_all('/(?:\A|[^a])ab(?:[^b]|\z)/', $string, $matches);
Explanation:
(?: \A | [^a] ) # very beginning of the input string OR a character except `a`
ab # match `ab`
(?: [^b] | \z ) # end of the input string OR a character except `b`
Live demo
Above was a simple understanding of what should be done but it's more than better to use a solution that is made for this specific purpose, named lookarounds:
/(?<!a)ab(?!b)/g
preg_match_all('/(?<!a)ab(?!b)/', $string, $matches);
There's a few ways. Regex as above, or using simple PHP instead:
$string = 'ab abc acd ab abd';
$chars = 'ab';
$strings = explode(" ", $string);
echo array_count_values($strings)[$chars];
// Outputs 2
// IF you don't have php 5.6:
$values = array_count_values($strings);
echo $values[$chars];
// Outputs 2

don't match string in brackets php regex

I've been trying to use preg_replace() in php to replace string. I want to match and replace all 's' in this string, but I just came with solution only mathching 's' between 'b' and 'c' or 's' between > <. Is there any way I can use negative look behind not just for the character '>' but for whole string ? I don't want to replace anything in brackets.
<text size:3>s<text size:3>absc
<text size:3>xxetxx<text size:3>sometehing
edit:
just get 's' in >s< and in bsc. Then when I will change string for example from 's' to 'te', to replace 'te' in xtex and sometehing. So I was looking for regular expression to avoid replacing anything in <....>
You can use this pattern:
$pattern = '/((<[^>]*>)*)([^s]*)s/';
$replace = '\1\3■'; # ■ = your replacement string
$result = preg_replace( $pattern, $replace, $str );
regex101 demo
Pattern explanation:
( # group 1:
(<[^>]*>)* # group 2: zero-or-more <...>
)
([^s]*) # group 3: zero-or-more not “s”
s # litterally “s”
If you want match case-insensitive, add a “i” at the end of pattern:
$pattern = '/((<[^>]*>)*)([^s]*)s/i';
Edit: Replacement explanation
In the search pattern we have 3 groups surrounded by round brackets. In the replace string we can refer to groups by syntax \1, where 1 is the group number.
So, replace string in the example means: replace group 1 with itself, replace group 3 with itself, replace “s” with desired replacement. We don't need to use group 2 because it is included in group 1 (this due to regex impossibility to retrieve repeating groups).
In the demo string:
abs<text size:3>ssss<text size:3><img src="img"><text size:3>absc
└┘╵└───────────┘╵╵╵╵└───────────────────────────────────────┘└┘╵╵
└─┘└────────────┘╵╵╵└──────────────────────────────────────────┘
1 2 345 6
Pattern matches:
group 1 group 3 s
--------- --------- ---------
1 > 0 1 1
2 > 1 0 1
3 > 0 0 1
4 > 0 0 1
5 > 0 0 1
6 > 3 1 1
The last “c” is not matches, so is not replaced.
Use preg_match_all to get all the s letters and use it with flag PREG_OFFSET_CAPTURE to get the indices.
The regular expression $pat contains a negative lookahead and lookbehind so that the s inside the brackets expression is not matched.
In this example I replace s with the string 5. Change to the string you want to substitute:
<?php
$s = " <text size:3>s<text size:3>absc";
$pat = "/(?<!\<text )s(?!ize:3\>)/";
preg_match_all($pat, $s, $matches, PREG_OFFSET_CAPTURE);
foreach ($matches[0] as $match) {
$s[$match[1]] = "5";
}
print_r(htmlspecialchars($s));

split string in numbers and text but accept text with a single digit inside

Let's say I want to split this string in two variables:
$string = "levis 501";
I will use
preg_match('/\d+/', $string, $num);
preg_match('/\D+/', $string, $text);
but then let's say I want to split this one in two
$string = "levis 5° 501";
as $text = "levis 5°"; and $num = "501";
So my guess is I should add a rule to the preg_match('/\d+/', $string, $num); that looks for numbers only at the END of the string and I want it to be between 2 and 3 digits.
But also the $text match now has one number inside...
How would you do it?
To slit a string in two parts, use any of the following:
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
This regex matches:
^ - the start of the string
(.*?) - Group 1 capturing any one or more characters, as few as possible (as *? is a "lazy" quantifier) up to...
\s* - zero or more whitespace symbols
(\d+) - Group 2 capturing 1 or more digits
\D* - zero or more characters other than digit (it is the opposite shorthand character class to \d)
$ - end of string.
The ~s modifier is a DOTALL one forcing the . to match any character, even a newline, that it does not match without this modifier.
Or
preg_split('~\s*(?=\s*\d+\D*$)~', $s);
This \s*(?=\s*\d+\D*$) pattern:
\s* - zero or more whitespaces, but only if followed by...
(?=\s*\d+\D*$) - zero or more whitespaces followed with 1+ digits followed with 0+ characters other than digits followed with end of string.
The (?=...) construct is a positive lookahead that does not consume characters and just checks if the pattern inside matches and if yes, returns "true", and if not, no match occurs.
See IDEONE demo:
$s = "levis 5° 501";
preg_match('~^(.*?)\s*(\d+)\D*$~s', $s, $matches);
print_r($matches[1] . ": ". $matches[2]. PHP_EOL);
print_r(preg_split('~\s*(?=\s*\d+\D*$)~', $s, 2));

Warning: preg_split() [function.preg-split]: Compilation failed: range out of order in character class

I am trying to convert an string into array by preg_split function. I want to get an array with 1 letter and optional number. For xample, if i have "NH2O3", i want the this output:
[0] => N,
[1] => H2,
[2] => O3
I have this code:
$formula = "NH2O3";
$pattern = '/[a-Z]{1}[0-9]?/';
$formula = preg_split($pattern, $formula);
But this retrieve an error:
Warning: preg_split() [function.preg-split]: Compilation failed: range
out of order in character class at offset 3 in
/home/masqueci/public_html/wp-content/themes/Flatnews/functions.php on
line 865 bool(false)
The error is due to a-Z (lowercase + uppercase). Change that to a-zA-Z or use the modifier i for case-insensitive matching, e.g.
/[a-z]{1}[0-9]?/i
You also need to use preg_split a bit differently in order to get that result:
$formula = "NH2O3";
$pattern = '/([a-z][0-9]?)/i';
$formula = preg_split($pattern, $formula, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
Specifics from http://php.net/preg_split:
PREG_SPLIT_NO_EMPTY If this flag is set, only non-empty pieces will be
returned by preg_split().
PREG_SPLIT_DELIM_CAPTURE If this flag is set, parenthesized expression
in the delimiter pattern will be captured and returned as well.
[a-Z] doesn't mean anything, if you want uppercase and lowercase letters, two solutions:
$pattern = '/[a-z][0-9]?/i';
or
$pattern = '/[a-zA-Z][0-9]?/';
Inside a character class - is used to define a range of characters in the unicode table. Since Z is before a in the table, the range doesn't exist.
Note: using [A-z] is false too, because there are other characters than letters between Z and a
A pattern to do that:
$formula = preg_split('/(?=[A-Z][a-z]?\d*)/', 'HgNO3', null, 1);
where (?=..) is a lookahead and means "followed by"
And 1 is a shortcut for PREG_SPLIT_NO_EMPTY

Categories