Regular expression to change lang in url path to be subdomain - php

I am learning regex. I have a very simple question:
I have a long string of content in php.I would like to convert all places where it says:
http://www.example.com/en/rest-of-url
to
http://en.example.com/rest-of-url
Could somebody help me with this? I think I use preg_replace for this?
Bonus: If you have a link to a good site which explains how to do the simplest things like this in regex, please post it. Every regex resource I look at gets very complicated very fast (even the Wikipedia article).

In PHP:
$search = '~http://www.example.com/([^/]+)/(.+)~';
$replace = 'http://$1.example.com/$2';
$new = preg_replace( $search, $replace, $original );

http://regexlib.com/
has a nice regular expression cheat sheet and tester

Assuming:
preg_replace($regex, $replaceWith, $subject);
$subject is the original text. $regex should be:
'#http://([^\.]*)\.example\.com/en/(.*)#'
$replaceWith should be:
'http://$1.example.com/$2'
EDITED: In my orignial answer, I had missed the fact that you wanted to capture part of the domain name.

This will work with any domainname:
$url = 'http://www.example.com/en/rest-of-url';
echo preg_replace('%www(\..*?/)(\w+)/%', '\2\1', $url);
gives:
http://en.example.com/rest-of-url
Reference: preg_replace

You can learn about basic regex, however for your simple question, there's no need for regex.
$str="http://www.example.com/en/rest-of-url";
$s = explode("/",$str);
unset( $s[3]);
print_r( implode("/",$s) ) ;

This is a great site for Regex Tutorials
http://www.regular-expressions.info/
Regex Tutor

Related

How to use preg_replace correct in the following case?

I hope somebody could help me out with some „preg_replace“ skills.
I have the following URL:
http://www.domain.com/goto/test-string/
Now I just want to get the „test-string“ part of the URL.
Any idea how I can solve this with preg_replace?
Thanks in advance already!!!
Best,
Florian
I'd suggest using parse_url() to get the pieces:
http://php.net/manual/en/function.parse-url.php
Then using explode( '/', $sUrl ); to get the string as needed.
http://php.net/manual/en/function.explode.php
For dynamic parsing you may need to tweak some more.
It seems like preg_match() would be a better tool for you:
if (preg_match('/([^/]*)/$', $url, $m) {
echo "The string you are looking for is $m[1]<br />\n";
}
but since you specifically asked for preg_replace() I suppose you could do something like this:
$foo = preg_replace('/^.*?/([^/]*)/$/', '$1', $url);
EDIT: I had forgotten to escape my slashes within the regex. Easier to just replace the regex delineators with pipe-symbols so that slashes are no longer special.
$foo = preg_replace('|^.*?/([^/]*)/$|', '$1', $url);
Please Try this one..
$url = 'http://www.domain.com/goto/test-string/hi/';
preg_match('/.*\/(.*?)\//',$url,$match);
echo $match[1];

PHP function to return only parts of string that contain certain characters?

i have a string as follows:
$product_req = "CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-6,ACTIVE-9";
and i need a function that returns only the numbers preceded by "CATEGORY-ACTIVE-" (without the quotes) so in other words it should return: 8,4 and leave everything else out.
Is there any php function that can do this?
Thank you.
Use Preg_match_all and extract the first match
$input_lines="CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-6,ACTIVE-9"
preg_match_all("/CATEGORY-ACTIVE-(\d+)/", $input_lines, $output_array);
print_r(join(',',$output_array[1]));
output
8,4
Is there any php function that can do this?
Yes you can play around and achieve it with PHP Native functions by writing some code logic. But do it with Regular Expressions (to keep it simple and short).
Using PHP Functions..
<?php
$str = 'CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-6,ACTIVE-9';
$str=explode(',',$str);
$temparr=array();
foreach($str as $v)
{
if(strpos($v,'CATEGORY-ACTIVE-')!==false)
{
$temparr[]=str_replace('CATEGORY-ACTIVE-','',$v);
}
}
echo implode(',',$temparr); //"prints" 8,4
Use regular expressions and implode it atlast (Preferred way..)
<?php
$str = 'CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-6,ACTIVE-9';
preg_match_all('/CATEGORY-ACTIVE-(.*?),/', $str, $matches);
echo implode(',',$matches[1]); //8,4
I'd use a lookaround assertion to accomplish this:
(?<=CATEGORY-ACTIVE-)(\d+)
Visualization:
Code:
$str = 'CATEGORY-ACTIVE-8,CATEGORY-ACTIVE-4,ACTIVE-6,ACTIVE-9';
preg_match_all('/(?<=CATEGORY-ACTIVE-)(\d+)/', $str, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => 8
[1] => 4
)
Demo
Yes there is, Feel free to explore the wonderful world of Regex!
http://il1.php.net/preg_match
I recommend you do a bit of reading on this yourself as "getting the answers" when it comes to regex is a sin, You learn nothing from it.
I'm not uber experienced with it myself, but it's one of those things that you must learn 'hands on', theory won't cut it here.
in theory it would look like this
$str = 84838493849384938;
preg_match_all(/[8.4]/, $str);
You can also go play around with REgex at this site http://www.phpliveregex.com/

preg_replace_callback pattern issue

I'm using the following pattern to capture links, and turn them into HTML friendly links. I use the following pattern in a preg_replace_callback and for the most part it works.
"#(https?|ftp)://(\S+[^\s.,>)\];'\"!?])#"
But this pattern fails when the text reads like so:
http://mylink.com/page[/b]
At that point it captures the [/b amusing it is part of the link, resulting in this:
woodmill.co.uk[/b]
I've look over the pattern, and used some cheat sheets to try and follow what is happening, but it has foxed me. Can any of you code ninja's help?
Try adding the open square bracket to your character class:
(\S+[^\s.,>)[\];'\"!?])
^
UPDATE
Try this more effective URL regex:
^(https?://)?([\da-z\.-]+)\.([a-z\.]{2,6})([/\w \.-]*)*/?$
(From: http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/)
I have no experience directly with PHP regular expressions, but the above is simple and generic enough that I wouldn't expect any problems. You may want to modify it some to extract just the domain, like you seem to be with your current regex.
Ok I solved the problem. Thanks to #Cyborgx37 and #MikeBrant for your help. Here's the solution.
Firstly I replaced my regexp pattern with the one that João Castro used in this question: Making a url regex global
The problem with that pattern is it captured any trailing dots at the end, so in the final section of the pattern I added ^. making the final part look like so [^\s^.]. As I read it, do not match a trailing space or dot.
This still caused an issue matching bbcode as I mentioned above, so I used preg_replace_callback() and create_function() to filter it out. The final create_function() looks like this:
create_function('$match','
$match[0] = preg_replace("/\[\/?(.*?)\]/", "", $match[0]);
$match[0] = preg_replace("/\<\/?(.*?)\>/", "", $match[0]);
$m = trim(strtolower($match[0]));
$m = str_replace("http://", "", $m);
$m = str_replace("https://", "", $m);
$m = str_replace("ftp://", "", $m);
$m = str_replace("www.", "", $m);
if (strlen($m) > 25)
{
$m = substr($m, 0, 25) . "...";
}
return "$m";
'), $string);
Tests so far are looking good, so I'm happy it is now solved.
Thanks again, and I hope this helps someone else :)

Extracting URLs from a JSON-like string

I need to extract the first URL from some content. The content may be like this:
({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});
or may contain only a link
({items:[{url:"http://portlandor.ebayclassifieds.com/",name:"Portland (OR)"}],error:null});
currently I have :
$pattern = "/\:\[\{url\:\"(.*)\"\,name/";
preg_match_all($pattern, $htmlContent, $matches);
$URL = $matches[1][0];
however it works only if there is a single link so I need a regex which should work for the both cases.
You can use this REGEX:
$pattern = "/url\:\"([^\"]+)\"/";
Worked for me :)
Hopefully this should work for you
<?php
$str = '({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});'; //The string you want to extract the 1st URL from
$match = ""; //Define the match variable
preg_match("%(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&\%\$#\=~_\-]+))*%",$str,$match); //I Googled for the best Regular expression for URLs and found the one included in the preg_match
echo $match[0]; //Return the first item in the array (the first URL returned)
?>
This is the website that I found the regular expression on: http://regexlib.com/Search.aspx?k=URL
like the others have said, json_decode should work for you aswell
That smells like JSON to me. Try using http://php.net/json_decode
Looks like JSON to me, visit http://php.net/manual/en/book.json.php and use json_decode().

PHP replace string help

i am designing a site with a comment system and i would like a twitter like reply system.
The if the user puts #a_registered_username i would like it to become a link to the user's profile.
i think preg_replace is the function needed for this.
$ALL_USERS_ROW *['USERNAME'] is the database query array for all the users and ['USERNAME'] is the username row.
$content is the comment containing the #username
i think this should not be very hard to solve for someone who is good at php.
Does anybody have any idea how to do it?
$content = preg_replace( "/\b#(\w+)\b/", "http://twitter.com/$1", $content );
should work, but I can't get the word boundary matches to work in my test ... maybe dependent on the regex library used in versions of PHP
$content = preg_replace( "/(^|\W)#(\w+)(\W|$)/", "$1http://twitter.com/$2$3", $content );
is tested and does work
You want it to go through the text and get it, here is a good starting point:
$txt='this is some text #seanja';
$re1='.*?'; # Non-greedy match on filler
$re2='(#)'; # Any Single Character 1
$re3='((?:[a-z][a-z]+))'; # Word 1
if ($c=preg_match_all ("/".$re1.$re2.$re3."/is", $txt, $matches))
{
$c1=$matches[1][0];
$word1=$matches[2][0]; //this is the one you want to replace with a link
print "($c1) ($word1) \n";
}
Generated with:
http://www.txt2re.com/index-php.php3?s=this%20is%20some%20text%20#seanja&-40&1
[edit]
Actually, if you go here ( http://www.gskinner.com/RegExr/ ), and search for twitter in the community tab on the right, you will find a couple of really good solutions for this exact problem:
$mystring = 'hello #seanja #bilbobaggins sean#test.com and #slartibartfast';
$regex = '/(?<=#)((\w+))(\s)/g';
$replace = '$1$3';
preg_replace($regex, $replace, $myString);
$str = preg_replace('~(?<!\w)#(\w+)\b~', 'http://twitter.com/$1', $str);
Does not match emails. Does not match any spaces around it.

Categories