Two questions regarding regular expressions - php

I currently use this:
$text = preg_replace('/' . $line . '/', '[x]\\0[/x]', $text);
$line is a simple regular expression:
https?://(?:.+?\.)?dailymotion\.com/video/[A-Za-z0-9]+
This is working fine so far. But there are two things that I need and I can't figure out, how to do that:
... I don't want to perform the replacement, if that string is contained within a BBCode i.e.
[bla]http://www.dailymotion.com/video/xuams9[/bla]
or
[bla=http://www.dailymotion.com/video/xuams9]trololo[/bla]
or
[bla='http://www.dailymotion.com/video/xuams9']http://www.dailymotion.com/video/xuams9[/bla]
The 2nd thing is, that I just want to match until the first space. This is what I currently use:
$text = preg_replace('/' . $line . '(?:[^ ]+)?/', '[x]\\0[/x]', $text);
I don't know, if I should do it like this or if there's a better way.
So, basically i'm just trying to match
http://www.dailymotion.com/video/test4
from this:
[tagx='http://www.dailymotion.com/video/test1']http://www.dailymotion.com/video/test2[/tagx] | [tagy]Hello http://www.dailymotion.com/video/test3 World[/tagy] | [tagz]Hello World[/tagz] http://www.dailymotion.com/video/test4
EDIT:
This is, what i have so far (which works slightly):
(?:(?<!(\[\/url\]|\[\/url=))(\s|^))' . $line . '(?:[^ ]+)(?:(?<![[:punct:]])(\s|\.?$))?

You can use a lookbehind assertions to do this.
http://php.net/manual/en/regexp.reference.assertions.php
By using the following lookbehind before $line
(?<!\[bla]|\[bla=|\[bla=')
it will match $link that is not starting with [bla], [bla= and [bla='.

→ Try this:
$text = array();
$text[ 0 ] = "[bla]http://www.dailymotion.com/video/xuams9[/bla]";
$text[ 1 ] = "[bla=http://www.dailymotion.com/video/xuams9]trololo[/bla]";
$text[ 2 ] = "http://www.dailymotion.com/video/xuams9";
$text[ 3 ] = "A http://www.dailymotion.com/video/xuams9 B C";
$line = "/http:\/\/www.dailymotion\.com\/video\/[A-Za-z0-9]+/";
$tag = array();
$tag[ 0 ] = "/\[[A-Za-z]{1,12}\]http:\/\/www.dailymotion\.com\/video\/[A-Za-z0-9]+\[\/[A-Za-z]{1,12}\]/";
$tag[ 1 ] = "/\[[A-Za-z]{1,12}=http:\/\/www.dailymotion\.com\/video\/[A-Za-z0-9]+\][A-Za-z0-9]{0,}\[\/[A-Za-z]{1,12}\]/";
foreach( $text as $k=>$v ) {
if( preg_match( $tag[ 0 ], $v ) == false && preg_match( $tag[ 1 ], $v ) == false ) {
echo '!';
$output = preg_replace( $line, '[x]\\0[/x]', $v );
}
else { $output = $v; };
echo "Text #" . ( $k + 1 ) . ": {$output}<br />";
}
Result:
Text #1: [bla]http://www.dailymotion.com/video/xuams9[/bla]
Text #2: [bla=http://www.dailymotion.com/video/xuams9]trololo[/bla]
!Text #3: [x]http://www.dailymotion.com/video/xuams9[/x]
!Text #4: A [x]http://www.dailymotion.com/video/xuams9[/x] B C

Related

Parse url with pattern in PHP?

How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.

How to get a portion of a string with a regular expression

I have this situation where I can have strings like this:
"Project\\V1\\Rest\\Car\\Controller"
"Project\\V1\\Rest\\Boat\\Controller"
"Project\\Action\\Truck"
"Project\\V1\\Rest\\Helicopter\\Controller"
"Parental\\Boat\\Action"
Just in case the string follow the pattern:
"Project\\V1\\Rest\\THE_DESIRED_WORD\\Controller"
I want to get THE_DESIRED_WORD.
That's why I'm thinking in a regular expression.
to use a regular expression, you need to escape the slash twice : once for the PHP string and once for the regex
try this :
$tab = array(
"Project\\V1\\Rest\\Car\\Controller",
"Project\\V1\\Rest\\Boat\\Controller",
"Project\\V1\\Rest\\Helicopter\\Controller",
"Project\\V1\\Rest\\Water\\Controller",
);
foreach ($tab as $s) {
preg_match("!\\\\([^\\\\]*)\\\\Controller!U", $s, $result);
var_dump($result);
}
Explode does the job using \\ separator :
<?php
$str1 = "Project\\V1\\Rest\\Car\\Controller";
$str2 = "Project\\V1\\Rest\\Boat\\Controller";
$str3 = "Project\\V1\\Rest\\Helicopter\\Controller";
$str4 = "Project\\V1\\Rest\\Water\\Controller";
$arr1 = explode( "\\",$str1 );
$arr2 = explode( "\\",$str2 );
$arr3 = explode( "\\",$str3 );
$arr4 = explode( "\\",$str4 );
echo $arr1[ 3 ] . ", " .
$arr2[ 3 ] . ", " .
$arr3[ 3 ] . ", " .
$arr4[ 3 ];
?>
Will display:
Car, Boat, Helicopter, Water
<?php
$str='"Project\\V1\\Rest\\Car\\Controller"
"Project\\V1\\Rest\\Boat\\Controller"
"Project\\V1\\Rest\\Helicopter\\Controller"
"Project\\V1\\Rest\\Water\\Controller"';
$line=explode(PHP_EOL,$str);
//print_r($line);
foreach ($line as $l){
$x=explode("\\",$l);
if($x[0]=='Project' && $x[1]=='V1' && $x[2]=='Rest' && $x[4]=='Controller' ){
echo $x[3];
}
}

How to I preg_match_all starts with "http" and ends with (") or (') or white space(tabs, space, line break)

How do I write in regex that preg_match_all starts with "http"(without quotes) and ends with (") or (') or white space(tabs, space, line break)
I want to preg_match_all all the parts just starting with "http"
Wuploadhttp://www.wupload.com/file/CCCCCCC/NNIW-LiBRARY.part1.rarhttp://www.wupload.com/file/VVVVVVVV/NNIW-LiBRARY.part2.rarhttp://www.wupload.com/file/TTTTTTT/NNIW-LiBRARY.part3.rarFileservehttp://www.fileserve.com/file/WWWW/NNIW-LiBRARY.part1.rarhttp://www.fileserve.com/file/TTTTT/NNIW-LiBRARY.part2.rarhttp://www.fileserve.com/file/RRRRR/NNIW-LiBRARY.part3.rarUploaded.Tohttp://ul.to/AAAA/NNIW-LiBRARY.part1.rarhttp://ul.to/BBBBB/NNIW-LiBRARY.part2.rarhttp://ul.to/YYYYYY/NNIW-LiBRARY.part3.rar
Results must be like this
http://www.wupload.com/file/CCCCCCC/NNIW-LiBRARY.part1.rar
http://www.wupload.com/file/VVVVVVVV/NNIW-LiBRARY.part2.rar
http://www.wupload.com/file/TTTTTTT/NNIW-LiBRARY.part3.rar
http://www.fileserve.com/file/WWWW/NNIW-LiBRARY.part1.rar
http://www.fileserve.com/file/TTTTT/NNIW-LiBRARY.part2.rar
http://www.fileserve.com/file/RRRRR/NNIW-LiBRARY.part3.rar
http://ul.to/AAAA/NNIW-LiBRARY.part1.rar
http://ul.to/BBBBB/NNIW-LiBRARY.part2.rar
http://ul.to/YYYYYY/NNIW-LiBRARY.part3.rar
i suggest you use parse_url to fetch parts of urls!
Take a look at php.net
EDIT :
$file = file_get_contents( YOUR FILE NAME );
$lines = explode("\r\n", $file);
foreach( $lines as $line ){
$urlParts = parse_url( $line );
if( $urlParts['scheme'] == 'http' ){
// Do anything ...
}
}
CHANGE :
oOk, i don't know what's your code!if you want to scrape html to find links i suggest this to you, it return href values of a tag to you :
preg_match_all ( "/<[ ]{0,}a[ \n\r][^<>]{0,}(?<= |\n|\r)(?:href)[ \n\r]{0,}=[ \n\r]{0,}[\"|']{0,1}([^\"'>< ]{0,})[^<>]{0,}>((?:(?!<[ \n\r]*\/a[ \n\r]*>).)*)<[ \n\r]*\/a[ \n\r]*>/ is", $source, $regs );
for ( $x = 0; $x < count ( $regs [ 1 ] ); $x ++ ) {
$tmp_array [ "link_raw" ] = trim ( $regs [ 1 ] [ $x ] );
}
Then use parse_url to check thoes
Do you mean you would like to remove the "Wupload", "Fileserve" and "Uploaded.To" titles and capture just the URLs in an array? If so, try the following:
preg_match_all('!^http://.*\n!m', $string, $matches);
echo "<pre>" . print_r($matches, 1) . "</pre>";
This should do what you need:
<?php
$matches = array();
preg_match_all('#https?://([-\w\.]+)+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?#', $string, $matches);
foreach ($matches[0] as $match) {
// Do your processing here.
}
?>

PHP Regex to match a list of words against a string

I have a list of words in an array. I need to look for matches on a string for any of those words.
Example word list
company
executive
files
resource
Example string
Executives are running the company
Here's the function I've written but it's not working
$matches = array();
$pattern = "/^(";
foreach( $word_list as $word )
{
$pattern .= preg_quote( $word ) . '|';
}
$pattern = substr( $pattern, 0, -1 ); // removes last |
$pattern .= ")/";
$num_found = preg_match_all( $pattern, $string, $matches );
echo $num_found;
Output
0
$regex = '(' . implode('|', $words) . ')';
<?php
$words_list = array('company', 'executive', 'files', 'resource');
$string = 'Executives are running the company';
foreach ($words_list as &$word) $word = preg_quote($word, '/');
$num_found = preg_match_all('/('.join('|', $words_list).')/i', $string, $matches);
echo $num_found; // 2
Make sure you add the 'm' flag to make the ^ match the beginning of a line:
$expression = '/foo/m';
Or remove the ^ if you don't mean to match the beginning of a line...

preg_replace out CSS comments?

I'm writing a quick preg_replace to strip comments from CSS. CSS comments usually have this syntax:
/* Development Classes*/
/* Un-comment me for easy testing
(will make it simpler to see errors) */
So I'm trying to kill everything between /* and */, like so:
$pattern = "#/\*[^(\*/)]*\*/#";
$replace = "";
$v = preg_replace($pattern, $replace, $v);
No dice! It seems to be choking on the forward slashes, because I can get it to remove the text of comments if I take the /s out of the pattern. I tried some simpler patterns to see if I could just lose the slashes, but they return the original string unchanged:
$pattern = "#/#";
$pattern = "/\//";
Any ideas on why I can't seem to match those slashes? Thanks!
Here's a solution:
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n",
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
Taken from the Script/Stylesheet Pre-Processor in Samstyle PHP Framework
See: http://code.google.com/p/samstyle-php-framework/source/browse/trunk/sp.php
csstest.php:
<?php
$buffer = file_get_contents('test.css');
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n",
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
echo $buffer;
?>
test.css:
/* testing to remove this */
.test{}
Output of csstest.php:
.test{}
I don't believe you can use grouping within a negated character class like you have there. What you're going to want to use is called Assertions, of which there are two types. "look-ahead" and "look-behind".
The pattern you're looking for in English is basically, "forward slash, literal wildcard, anything that isn't followed by a forward slash or anything other than a literal wildcard that is followed by a forward slash or a forward slash that isn't preceded by a literal wildcard zero or more times, literal wild card, forward slash"
<?php
$str = '/* one */ onemore
/*
* a
* b
**/
stuff // single line
/**/';
preg_match_all('#/\*(?:.(?!/)|[^\*](?=/)|(?<!\*)/)*\*/#s', $str, $matches);
print_r($matches);
?>
I had the same issue.
To solve it, I first simplified the code by replacing "/ASTERIX" and "ASTERIX/" with different identifiers and then used those as the start and end markers.
$code = str_replace("/*","_COMSTART",$code);
$code = str_replace("*/","COMEND_",$code);
$code = preg_replace("/_COMSTART.*?COMEND_/s","",$code);
The /s flag tells the search to go onto new lines
There's a number of suggestions out there, but this one seems to work for me:
$v=preg_replace('!/\*[^*]*\*+([^/][^*]*\*+)*/!', '', $v);
so
"/* abc */.test { color:white; } /* XYZ */.test2 { padding:1px; /* DEF */} /* QWERTY */"
gives
.test { color:white; } .test2 { padding:1px; }
see https://onlinephp.io/c/2ae1c for working test
Just for fun(and small project of course) I made a non-regexp version of a such code (I hope it's faster):
function removeCommentFromCss( $textContent )
{
$clearText = "";
$charsInCss = strlen( $textContent );
$searchForStart = true;
for( $index = 0; $index < $charsInCss; $index++ )
{
if ( $searchForStart )
{
if ( $textContent[ $index ] == "/" && (( $index + 1 ) < $charsInCss ) && $textContent[ $index + 1 ] == "*" )
{
$searchForStart = false;
continue;
}
else
{
$clearText .= $textContent[ $index ];
}
}
else
{
if ( $textContent[ $index ] == "*" && (( $index + 1 ) < $charsInCss ) && $textContent[ $index + 1 ] == "/" )
{
$searchForStart = true;
$index++;
continue;
}
}
}
return $clearText;
}

Categories