PHP Remove all after particular pattern - php

I want to remove all characters after a particular pattern from a string (url). Following are some example urls.
http://www.example.com/profile/aaa-bbb/Group
http://www.example.com/profile/ccc-ddd/Group?tab=23
http://www.example.com/profile/Group-sss-t/Group
http://www.example.com/profile/ppp-qqq/
I need the output as,
http://www.example.com/profile/aaa-bbb/
http://www.example.com/profile/ccc-ddd/
http://www.example.com/profile/Group-sss-t/
http://www.example.com/profile/ppp-qqq/
Here actually i need to remove all characters after Group, but in the third utl there Group is present twice. Dont know how to handle this. Help please, thanks in advance

Something like this should do the trick ( removes everything after the last / )
$newUrl = preg_replace('/(.*)\/.*$/', '$1/', $url);
See: http://phpfiddle.org/main/code/j7c-8gx and hit F9 to see the result of url: 'http://www.example.com/profile/ccc-ddd/Group?tab=23'

I would use strrpos witch finds the position of a substring but starts from the end:
if(strpos($url,"Group")!==false){
$url = substr($url,0,strrpos($url,"Group"));
}

%(http://www.example.com/profile/[^/]+/)%
Matches http://www.example.com/profile/ followed by one of these groups.
So preg_match_all('%(http://www.example.com/profile/[^/]+/)%', $urls, $matches) saves the matched parts in $matches.

Related

Replace many code lines in PHP between tags

I have gotten a page php with this line:
$url = file_get_contents('http://web.com/rss.php');
Now I want replace this:
<link>http://web.com/download/45212/lorem-ipsum</link>
<link>http://web.com/download/34210/dolor-sit</link>
<link>http://web.com/download/78954/consectetur-adipiscing</link>
<link>http://web.com/download/77741/laboris-nisi</link>...
With this:
<link>http://otherweb.com/get-d/45212</link>
<link>http://otherweb.com/get-d/34210</link>
<link>http://otherweb.com/get-d/78954</link>
<link>http://otherweb.com/get-d/77741</link>...
I have replaced a part with str_replace but I don't know to replace the other part.
This is what i have done for the moment:
$url = str_replace('<link>http://web.com/download/','<link>http://otherweb.com/get-d/', $url);
You can do this all with a single line of regex :)
Regex
The below regex will detect your middle numbered section....
<link>http:\/\/web\.com\/download\/(.*?)\/.*?<\/link>
PHP
To use this inside PHP you could use this line of code
$url = preg_replace("/<link>http:\/\/web\.com\/download\/(.*?)\/.*?<\/link>/m", "<link>http://otherweb.com/get-d/$1</link>", $url);
This should do exactly what you need!
Explanation
The way it works is preg_replace looks for <link>http://web.com/download/ at the start and /{something}</link> at the end. It captures the middle area into $1
So when we run preg_replace ($pattern, $replacement, $subject) we tell PHP to just find that middle part (the numbers in your URLS) and embed them into "<link>http://otherweb.com/get-d/$1</link>".
I tested it and it seems to be working :)
Edit: I would propose this answer as best for you as it does everything with a single line, and does not require any str_replace. My answer also will function even if the middle section is alphanumeric, and not only if it is numeric.
All you want to do is:
extract the relevant data e.g. the five digit number
put the extracted part into a new context
$input = 'http://web.com/download/45212/lorem-ipsum';
echo preg_replace('/.*\/(\d+).*/', 'http://otherweb.com/get-d/$1', $input);
To extract the relevant part, you can use (\d+) which means: find one or more digits, the parentheses make this a matching group, so you can access this value via $1.
To match and replace the whole line, you have to augment the pattern with .* (which means, find any number of any character) before and after the (\d+) part.
With this set up, the whole string matches, so the whole string will be replaced.
You should replace the initial part of link with a token, and then preg_replace the end of your string searching for the first / and replacing with the </link>. And so you replace your token with the initial part you desire.
$url = str_replace('<link>http://web.com/download/','init', $url);
$url = preg_replace("/\/.+/", "</link>", $url);
$url = str_replace('init', '<link>http://otherweb.com/get-d/', $url);
You're just missing a simple regex to clean up the last part.
Here's how I did it:
$messed_up = '
<link>http://web.com/download/45212/lorem-ipsum</link>
<link>http://web.com/download/34210/dolor-sit</link>
<link>http://web.com/download/78954/consectetur-adipiscing</link>
<link>http://web.com/download/77741/laboris-nisi</link>';
// Firstly we can clean up the first part (like you did) with str_replace
$clean = str_replace('web.com/download/','otherweb.com/get-d/', $messed_up);
// After that we'll use preg_replace to get rid of the last part
$clean = preg_replace("/(.+\/\d+)\/.*(<.*)/", "$1$2", $clean);
printf($clean);
/* Returns:
<link>http://otherweb.com/get-d/4521</link>
<link>http://otherweb.com/get-d/3421</link>
<link>http://otherweb.com/get-d/7895</link>
<link>http://otherweb.com/get-d/7774</link>
*/
I made this quickly so there might be some room for improvement but it definitely works.
You can check out the code in practice HERE.
If you're interested in learning PHP RegEx This is a great place to practice.

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

How to cut out everything from a string except certain part of it in php?

Let's say I have string like this:
Village_name(315|431 K64)
What I want to do is when I paste that into let's say text box, and click a button, all I will be left with is 315|431.
Is there a way of doing this?
Use the below regex and then replace the match with \1.
(\d+\|\d+)|.
It captures the number|number part and matches all the remaining chars. By replacing all the matched chars with \1 will give you the number|number part only.
DEMO
In php, you may use this also.
(?:\d+\|\d+)(*SKIP)(*F)|.
substring which was matched by \d+\|\d+ regex would be matched first and the following (*SKIP)(*F) makes the regex to fail. Now thw . after the pipe symbol would match all the chars except number|number because we already skipped that part.
DEMO
I know this question has been answered and the answer has been accepted. But I still want to suggest this answer, as you really don't need to use PHP to realize your requirement. Just use Javascript. Its enough:
var str = 'Village_name(315|431 K64)';
var pattern = /\((\w+\|\w+) /;
var res = str.match(pattern);
document.write(res[1]);
Please try this:-
<?php
$str = 'Village_name(315|431 K64)';
preg_match_all('/(?:\d+\|\d+)/', $str, $matches);
echo "<pre/>";print_r($matches);//print in array format completly
$i=0;
foreach($matches as $match){ //iteration through one foreach as you asked
echo $match[$i];
$i++;
}
?>
Output:- http://prntscr.com/74ddg9
Note:- explode can work with some adjustment but if the format only like what you given.So go for preg_match_all. It's best.

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

how do I match a url in php using regex?

I'm trying to match the value of query v in the following regex:
http:\/\/www\.domain\.com\/videos\/video.php\?.*v=([a-z0-9-_]+)
A sample url:
http://www.domain.com/videos/video.php?v=9Gu0sd2dmm91B9b1
The url is always www and I'm only trying to match the v value. Does anyone know what's wrong with my syntax?
Use the parse_url() function. It's way easier to use:
$url_components = parse_url("http://www.domain.com/videos/video.php?v=9Gu0sd2dmm91B9b1");
echo $url_components['query'];
From there I think you can do the rest and slice off the first couple of letters. Once you do that you're left with only the stuff after v=.
you forget the capital letters
http:\/\/www\.domain\.com\/videos\/video.php\?.*v=([a-zA-Z0-9-_]+)
You are not escaping the period '.' in video.php. I also use a different delimiter if I am escaping paths/URL's - like this:
preg_match( "#http://www\.domain\.code/videos/video\.php\?.*v=([^&]*)#", $url, $matches );
If the v= is in the middle of the query string,
v=([^&]*)
.. will match everything up to another & symbol, just in case characters other than alphas and _,- end up in there for some reason.

Categories