How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
Related
I have an array of patterns:
$patterns= array(
'#http://(www\.)?domain-1.com/.*#i',
'#http://(www\.)?domain-2.com/.*#i',
...
);
and I have long string that contains multiple text and urls I want to match the first url occurred in the string, I only tried this :
foreach ($patterns as $pattern) {
preg_match($pattern, $the_string, $match);
echo '<pre>'; print_r($match); echo '</pre>';
}
It returns empty arrays where there are no match for some patterns and arrays that contains a url but depending on the order of the array $patterns,
how can I find any match of these patterns that occurred first.
You basically have three options:
match a general URL pattern and then run that URL against the patterns you've got. If none matched, continue with the second result from the general pattern.
run all your patterns with the PREG_OFFSET_CAPTURE flag to get the offset the pattern matched at. find the lowest offset, return your result
combine your various patterns to a single pattern. Be aware that there are limits to the length of a pattern (64K in compiled form)
Option 2:
<?php
$text = "hello world http://www.domain-2.com/foo comes before http://www.domain-1.com/bar";
$patterns= array(
'#http://(www\.)?domain-1.com/[^\s]*#i',
'#http://(www\.)?domain-2.com/[^\s]*#i',
);
$match = null;
$offset = null;
foreach ($patterns as $pattern) {
if (preg_match($pattern, $text, $matches, PREG_OFFSET_CAPTURE)) {
if ($matches[0][1] < $offset || $offset === null) {
$offset = $matches[0][1];
$match = $matches[0][0];
}
}
}
var_dump($match);
beware that I changed your demo patterns. I replaced .* (anything) by [^\s]* (everything but space) to prevent the pattern from matching more than it's supposed to
I guess you're looking for this:
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match)) {
echo '<pre>'; print_r($match); echo '</pre>';
break;
}
}
UPDATE:
Then I think you should work with offsets linke this:
$matches = array();
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match, PREG_OFFSET_CAPTURE)) {
$matches[$match[0][1]] = $match[0][0];
}
}
echo reset($matches);
I can't think of any way except evaluting all the strings one at a time, and grabbing the earliest:
$easliestPos = strlen($the_string) + 1;
$earliestMatch = false;
foreach ($patterns as $pattern) {
if (preg_match($pattern, $the_string, $match)) {
$myMatch = $match[0];
$myMatchPos = strpos($myMatch, $the_string);
if ($myMatchPos < $easliestPos ) {
$easliestPos = $myMatchPos;
$earliestMatch = $myMatch ;
}
}
}
if ($earliestMatch ) {
echo $earliestMatch;
}
I need some help with twitter hashtag, I need to extract a certain hashtag as string variable in PHP.
Until now I have this
$hash = preg_replace ("/#(\\w+)/", "#$1", $tweet_text);
but this just transforms hashtag_string into link
Use preg_match() to identify the hash and capture it to a variable, like so:
$string = 'Tweet #hashtag';
preg_match("/#(\\w+)/", $string, $matches);
$hash = $matches[1];
var_dump( $hash); // Outputs 'hashtag'
Demo
I think this function will help you:
echo get_hashtags($string);
function get_hashtags($string, $str = 1) {
preg_match_all('/#(\w+)/',$string,$matches);
$i = 0;
if ($str) {
foreach ($matches[1] as $match) {
$count = count($matches[1]);
$keywords .= "$match";
$i++;
if ($count > $i) $keywords .= ", ";
}
} else {
foreach ($matches[1] as $match) {
$keyword[] = $match;
}
$keywords = $keyword;
}
return $keywords;
}
As i understand you are saying that
in text/pargraph/post you want to show tag with hash sign(#) like this:- #tag
and in url you want to remove # sign because the string after # is not sended to server in request so i have edited your code and try out this:-
$string="www.funnenjoy.com is best #SocialNetworking #website";
$text=preg_replace('/#(\\w+)/','<a href=/hash/$1>$0</a>',$string);
echo $text; // output will be www.funnenjoy.com is best <a href=search/SocialNetworking>#SocialNetworking</a> <a href=/search/website>#website</a>
Extract multiple hashtag to array
$body = 'My #name is #Eminem, I am rap #god, #Yoyoya check it #out';
$hashtag_set = [];
$array = explode('#', $body);
foreach ($array as $key => $row) {
$hashtag = [];
if (!empty($row)) {
$hashtag = explode(' ', $row);
$hashtag_set[] = '#' . $hashtag[0];
}
}
print_r($hashtag_set);
You can use preg_match_all() PHP function
preg_match_all('/(?<!\w)#\w+/', $description, $allMatches);
will give you only hastag array
preg_match_all('/#(\w+)/', $description, $allMatches);
will give you hastag and without hastag array
print_r($allMatches)
You can extract a value in a string with preg_match function
preg_match("/#(\w+)/", $tweet_text, $matches);
$hash = $matches[1];
preg_match will store matching results in an array. You should take a look at the doc to see how to play with it.
Here's a non Regex way to do it:
<?php
$tweet = "Foo bar #hashTag hello world";
$hashPos = strpos($tweet,'#');
$hashTag = '';
while ($tweet[$hashPos] !== ' ') {
$hashTag .= $tweet[$hashPos++];
}
echo $hashTag;
Demo
Note: This will only pickup the first hashtag in the tweet.
I am trying to test if a string made up of multiple words and has any values from an array at the end of it. The following is what I have so far. I am stuck on how to check if the string is longer than the array value being tested and that it is present at the end of the string.
$words = trim(preg_replace('/\s+/',' ', $string));
$words = explode(' ', $words);
$words = count($words);
if ($words > 2) {
// Check if $string ends with any of the following
$test_array = array();
$test_array[0] = 'Wizard';
$test_array[1] = 'Wizard?';
$test_array[2] = '/Wizard';
$test_array[4] = '/Wizard?';
// Stuck here
if ($string is longer than $test_array and $test_array is found at the end of the string) {
Do stuff;
}
}
By end of string do you mean the very last word? You could use preg_match
preg_match('~/?Wizard\??$~', $string, $matches);
echo "<pre>".print_r($matches, true)."</pre>";
I think you want something like this:
if (preg_match('/\/?Wizard\??$/', $string)) { // ...
If it has to be an arbitrary array (and not the one containing the 'wizard' strings you provided in your question), you could construct the regex dynamically:
$words = array('wizard', 'test');
foreach ($words as &$word) {
$word = preg_quote($word, '/');
}
$regex = '/(' . implode('|', $words) . ')$/';
if (preg_match($regex, $string)) { // ends with 'wizard' or 'test'
Is this what you want (no guarantee for correctness, couldn't test)?
foreach( $test_array as $testString ) {
$searchLength = strlen( $testString );
$sourceLength = strlen( $string );
if( $sourceLength <= $searchLength && substr( $string, $sourceLength - $searchLength ) == $testString ) {
// ...
}
}
I wonder if some regular expression wouldn't make more sense here.
I am new in PHP and can't figure out how to do this:
$link = 'http://www.domainname.com/folder1/folder2/folder3/folder4';
$domain_and_slash = http://www.domainname.com . '/';
$address_without_site_url = str_replace($domain_and_slash, '', $link);
foreach ($folder_adress) {
// function here for example
echo $folder_adress;
}
I can't figure out how to get the $folder_adress.
In the case above I want the function to echo these four:
folder1
folder1/folder2
folder1/folder2/folder3
folder1/folder2/folder3/folder4
The $link will have different amount of subfolders...
This gets you there. Some things you might explore more: explode, parse_url, trim. Taking a look at the docs of there functions gets you a better understanding how to handle url's and how the code below works.
$link = 'http://www.domainname.com/folder1/folder2/folder3/folder4';
$parts = parse_url($link);
$pathParts = explode('/', trim($parts['path'], '/'));
$buffer = "";
foreach ($pathParts as $part) {
$buffer .= $part.'/';
echo $buffer . PHP_EOL;
}
/*
Output:
folder1/
folder1/folder2/
folder1/folder2/folder3/
folder1/folder2/folder3/folder4/
*/
You should have a look on explode() function
array explode ( string $delimiter , string $string [, int $limit ] )
Returns an array of strings, each of
which is a substring of string formed
by splitting it on boundaries formed
by the string delimiter.
Use / as the delimiter.
This is what you are looking for:
$link = 'http://www.domainname.com/folder1/folder2/folder3/folder4';
$domain_and_slash = 'http://www.domainname.com' . '/';
$address_without_site_url = str_replace($domain_and_slash, '', $link);
// this splits the string into an array
$address_without_site_url_array = explode('/', $address_without_site_url);
$folder_adress = '';
// now we loop through the array we have and append each item to the string $folder_adress
foreach ($address_without_site_url_array as $item) {
// function here for example
$folder_adress .= $item.'/';
echo $folder_adress;
}
Hope that helps.
Try this:
$parts = explode("/", "folder1/folder2/folder3/folder4");
$base = "";
for($i=0;$i<count($parts);$i++){
$base .= ($base ? "/" : "") . $parts[$i];
echo $base . "<br/>";
}
I would use preg_match() for regular expression method:
$m = preg_match('%http://([.+?])/([.+?])/([.+?])/([.+?])/([.+?])/?%',$link)
// $m[1]: domain.ext
// $m[2]: folder1
// $m[3]: folder2
// $m[4]: folder3
// $m[5]: folder4
1) List approach: use split to get an array of folders, then concatenate them in a loop.
2) String approach: use strpos with an offset parameter which changes from 0 to 1 + last position where a slash was found, then use substr to extract the part of the folder string.
EDIT:
<?php
$folders = 'folder1/folder2/folder3/folder4';
function fn($folder) {
echo $folder, "\n";
}
echo "\narray approach\n";
$folder_array = split('/', $folders);
foreach ($folder_array as $folder) {
if ($result != '')
$result .= '/';
$result .= $folder;
fn($result);
}
echo "\nstring approach\n";
$pos = 0;
while ($pos = strpos($folders, '/', $pos)) {
fn(substr($folders, 0, $pos++));
}
fn($folders);
?>
If I had time, I could do a cleaner job. But this works and gets across come ideas: http://codepad.org/ITJVCccT
Use parse_url, trim, explode, array_pop, and implode
If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site http://www.tunyurl.com/34934";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site www.tinyurl.com/55555";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
return implode(' ',$U);
}
$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
preg_match(
"/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
$string,
$M);
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
}
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
}
return implode(' ',$U);
}
$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
returns:
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
'/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i'
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
break;
}
}
$new_string .= " ".$value;
}
return $new_string;
}
You would need to write a regular expression to extract out the urls.