Checking pattern as much as possible

Checking pattern as much as possible - php

How to make preg find all possible solutions for regular expression pattern?
Here's the code:
<?php
$text = 'Amazing analyzing.';
$regexp = '/(^|\\b)([\\S]*)(a)([\\S]*)(\\b|$)/ui';
$matches = array();
if (preg_match_all($regexp, $text, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
echo "{$match[2]}[{$match[3]}]{$match[4]}\n";
}
}
?>
Output:
Am[a]zing
an[a]lyzing.
Output that i need:
[A]mazing
Am[a]zing
[A]nalyzing.
an[a]lyzing.

You have to use look behind/ahead zero-length assertions (instead of a normal pattern which consumes the characters around what your are looking for): http://www.regular-expressions.info/lookaround.html

Lookaround assertions won't help, for two reasons:
Since they are zero-length, they won't return characters that you need.
As Avinash Raj noted, PHP lookbehind doesn't allow *.
This yields the output that you need:
<?php
$text = 'Amazing analyzing.';
foreach (preg_split('/\s+/', $text) as $word)
{
$matches = preg_split('/(a)/i', $word, 0, PREG_SPLIT_DELIM_CAPTURE);
for ($match = 1; $match < count($matches); $match += 2)
{
$prefix = join(array_slice($matches, 0, $match));
$suffix = join(array_slice($matches, $match+1));
echo "{$prefix}[{$matches[$match]}]{$suffix}\n";
}
}
?>

Related

PHP Regex expression excluding <pre> tag

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);

You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>

It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

PHP Regex get reverse number

i have this:
$pattern = 'dev/25{LASTNUMBER}/P/{YYYY}'
$var = 'dev/251/P/2014'
in this situation {LASTNUMBER} = 1 how to get this from $var
vars in pattern can by more always in {}
pattern can by different example :
$pattern = '{LASTNUMBER}/aa/bb/P/{OtherVar}'
in this situation var will by 1/aa/bb/p/some and want get 1
I need get {LASTNUMBER} have pattern and have results
Ok maybe is not possible :) or very very hard

use a regex..
if (preg_match('~dev/25([0-9])/P/[0-9]{4}~', $var, $m)) {
$lastnum = $m[1];
}

$parts = explode("/", $pattern);
if (isset($parts[1])) {
return substr($parts[1], -1);
}
will be faster than regex :)

You probably need this:
<?php
$pattern = 'dev/251/P/2014';
preg_match_all('%dev/25(.*?)/P/[\d]{4}%sim', $pattern, $match, PREG_PATTERN_ORDER);
$match = $match[1][0];
echo $match; // echo's 1
?>
Check it online
If you need to loop trough results you can use:
<?php
$pattern = <<< EOF
dev/251/P/2014
dev/252/P/2014
dev/253/P/2014
dev/254/P/2014
dev/255/P/2014
EOF;
preg_match_all('%dev/25(.*?)/P/[\d]{4}%sim', $pattern , $match, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($match[1]); $i++) {
echo $match[1][$i]; //echo's 12345
}
?>
Check in online

Most efficient way of Extracting tags from multiple strings

I have an html page with multiple instances of the following tags:
<INCLUDEFILE-1-/var/somepath/file1.php>
<INCLUDEFILE-2-/var/somepath/file2.php>
<INCLUDEFILE-3-/var/somepath/file3.php>
<INCLUDEFILE-4-/var/somepath/file4.php>
<INCLUDEFILE-5-/var/somepath/file5.php>
What code can I use to extract all of the paths above? I have so far got the following code but cannot get it to work properly:
preg_match_all('/INCLUDEFILE[^"]+/m', $html, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++)
{
$includefile = $result[0][$i];
}
I need to extract:
/var/somepath/file1.php
/var/somepath/file2.php
/var/somepath/file3.php
/var/somepath/file4.php
/var/somepath/file5.php
Can anyone see the obvious mistake(s)?!

The shortest way to happiness:
$pattern = '`<INCLUDEFILE-\d+-\K/[^>\s]+`';
preg_match_all($pattern, $subject, $results);
$results=$results[0];
print_r($results);

I changed your regex slightly and added parenthesis to capture the subpattern you need. I didn't see quotes (") in the posted example so I changed to checking for ">" to detect the end. I also added the ungreedy modifier, you may try how it goes with or without ungreedy. I also check for result[1] which will contain the first subpattern matches.
preg_match_all('/<INCLUDEFILE-[0-9]+-([^>]+)>/Um', $html, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[1]); $i++)
{
$includefile = $result[1][$i];
}

You could do it this way:
$html = '
<INCLUDEFILE-1-/var/somepath/file1.php>fadsf
asdfasf<INCLUDEFILE-2-/var/somepath/file2.php>adsfaf
<INCLUDEFILE-3-/var/somepath/file3.php>asdfadsf
<INCLUDEFILE-4-/var/somepath/file4.php>
<INCLUDEFILE-5-/var/somepath/file5.php>
';
$lines = explode(PHP_EOL, $html);
$files = array();
foreach($lines as $line)
{
preg_match('/<INCLUDEFILE-\d+-(.+?)>/', $line, $match);
if(!empty($match)) {
$files[] = $match[1];
}
}
var_dump($files);

preg_match to show first match in foreach loop

I have an array of patterns:
$patterns= array(
'#http://(www\.)?domain-1.com/.*#i',
'#http://(www\.)?domain-2.com/.*#i',
...
);
and I have long string that contains multiple text and urls I want to match the first url occurred in the string, I only tried this :
foreach ($patterns as $pattern) {
preg_match($pattern, $the_string, $match);
echo '<pre>'; print_r($match); echo '</pre>';
}
It returns empty arrays where there are no match for some patterns and arrays that contains a url but depending on the order of the array $patterns,
how can I find any match of these patterns that occurred first.

You basically have three options:
match a general URL pattern and then run that URL against the patterns you've got. If none matched, continue with the second result from the general pattern.
run all your patterns with the PREG_OFFSET_CAPTURE flag to get the offset the pattern matched at. find the lowest offset, return your result
combine your various patterns to a single pattern. Be aware that there are limits to the length of a pattern (64K in compiled form)
Option 2:
<?php
$text = "hello world http://www.domain-2.com/foo comes before http://www.domain-1.com/bar";
$patterns= array(
'#http://(www\.)?domain-1.com/[^\s]*#i',
'#http://(www\.)?domain-2.com/[^\s]*#i',
);
$match = null;
$offset = null;
foreach ($patterns as $pattern) {
if (preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE)) {
if ($matches[0][1] < $offset || $offset === null) {
$offset = $matches[0][1];
$match = $matches[0][0];
}
}
}
var_dump($match);
beware that I changed your demo patterns. I replaced .* (anything) by [^\s]* (everything but space) to prevent the pattern from matching more than it's supposed to

I guess you're looking for this:
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match)) {
echo '<pre>'; print_r($match); echo '</pre>';
break;
}
}
UPDATE:
Then I think you should work with offsets linke this:
$matches = array();
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match, PREG_OFFSET_CAPTURE)) {
$matches[$match[0][1]] = $match[0][0];
}
}
echo reset($matches);

I can't think of any way except evaluting all the strings one at a time, and grabbing the earliest:
$easliestPos = strlen($the_string) + 1;
$earliestMatch = false;
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match)) {
$myMatch = $match[0];
$myMatchPos = strpos($myMatch, $the_string);
if ($myMatchPos < $easliestPos ) {
$easliestPos = $myMatchPos;
$earliestMatch = $myMatch ;
}
}
}
if ($earliestMatch ) {
echo $earliestMatch;
}

Check if any array values are present at the end of a string

I am trying to test if a string made up of multiple words and has any values from an array at the end of it. The following is what I have so far. I am stuck on how to check if the string is longer than the array value being tested and that it is present at the end of the string.
$words = trim(preg_replace('/\s+/',' ', $string));
$words = explode(' ', $words);
$words = count($words);
if ($words > 2) {
// Check if $string ends with any of the following
$test_array = array();
$test_array[0] = 'Wizard';
$test_array[1] = 'Wizard?';
$test_array[2] = '/Wizard';
$test_array[4] = '/Wizard?';
// Stuck here
if ($string is longer than $test_array and $test_array is found at the end of the string) {
Do stuff;
}
}

By end of string do you mean the very last word? You could use preg_match
preg_match('~/?Wizard\??$~', $string, $matches);
echo "<pre>".print_r($matches, true)."</pre>";

I think you want something like this:
if (preg_match('/\/?Wizard\??$/', $string)) { // ...
If it has to be an arbitrary array (and not the one containing the 'wizard' strings you provided in your question), you could construct the regex dynamically:
$words = array('wizard', 'test');
foreach ($words as &$word) {
$word = preg_quote($word, '/');
}
$regex = '/(' . implode('|', $words) . ')$/';
if (preg_match($regex, $string)) { // ends with 'wizard' or 'test'

Is this what you want (no guarantee for correctness, couldn't test)?
foreach( $test_array as $testString ) {
$searchLength = strlen( $testString );
$sourceLength = strlen( $string );
if( $sourceLength <= $searchLength && substr( $string, $sourceLength - $searchLength ) == $testString ) {
// ...
}
}
I wonder if some regular expression wouldn't make more sense here.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Checking pattern as much as possible - php

You have to use look behind/ahead zero-length assertions (instead of a normal pattern which consumes the characters around what your are looking for): http://www.regular-expressions.info/lookaround.html

Related

PHP Regex expression excluding <pre> tag

PHP Regex get reverse number

Most efficient way of Extracting tags from multiple strings

preg_match to show first match in foreach loop

Check if any array values are present at the end of a string

Categories

Resources