preg_replace twitter url not working with question mark php - php

I have created the next function to replace an url by a div with its id.
function twitterIzer($string){
$pattern = '~https?://twitter\.com/.*?/status/(\d+)~';
$string = preg_replace($pattern, "<div class='tweet' id='tweet$1' tweetid='$1'></div>", $string);
return $string;
}
It works well when I use this type of url
https://twitter.com/Minsa_Peru/status/1260658846143401984
but it retrieve an excedent ?s=20 when I use this url
https://twitter.com/Minsa_Peru/status/1262730246668922885?s=20
How can I remove this ?s=20 text, in order to make work my function ? Anything I know is I need to improve my regex pattern. thank you.

If you want just regex:
$pattern = '/https?:\/\/twitter\.com\/.*?\/status\/(\d+)(.*)?/';
Because ? is not a digit so it will seperate with (.*), this mean every thing rest and in this case is ?s=xyz, last question mark ? is to say that is can exist or not.
Learn regex

Related

How to prevent of re-replacing by second regex?

I have two regex(s) on the way of my input, these:
// replace a URL with a link which is like this pattern: [LinkName](LinkAddress)
$str= preg_replace("/\[([^][]*)]\(([^()]*)\)/", "<a href='$2' target='_blank'>$1</a>", $str);
// replace a regular URL with a link
$str = preg_replace("/(\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|])/i","untitled", $str);
Now there is a problem (somehow a collision). For regular URLs everything is fine. But for a pattern-based URLs, there is a problem: The first regex create a link of that and second regex again create a link of its href-attribute value.
How can I fix it?
Edit: According to the comments, how can I create a single regex instead of those two regex? (using preg_replace_callback). Honestly I tried it but it doesn't work for none kind of URLs ..
Is combining them possible? Because the output of those isn't identical. The first one has a LinkName and the second one has a constant string untitled as its LinkName.
$str = preg_replace_callback('/\[([^][]*)]\(([^()]*)\)|(\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|])/i',
function($matches) {
if(isset($matches[3])) {
// replace a regular URL with a link
return "<a href='".$matches[3]."' target='_blank'>untitled</a>";
} else {
// replace a URL with a link which is like this pattern: [LinkName](LinkAddress)
return "<a href=".$matches[2]." target='_blank'>".$matches[1]."</a>";
}
}, $str);
echo $str;
One way would be to do it like this. You merge your two expressions together with the alternative character |. Then in your callback function you just check if your third capture group is set (isset($matches[3])) and if yes, then your second regular expression matched the string and you replace a normal link, otherwise you replace with link/linktext.
I hope you understand everything and I could help you.

find match of 1st word and and last

I have a url that looks some what like this
for-sale/stuff/state/used-bla-bla2-bla3-bla4-(bla5)---f10-85934.html
i'm trying to validate the format, in my function using this regex.
if (preg_match('/(?:^|(?:\-))(\w+)/g', $pathInfo, $matches)) {
echo $digit = $matches[0];
}
$pathInfo is the url given above.
Basically i want to match
make sure the directory is for-sale/stuff/
used-bla-bla2-bla3-bla4-(bla5)---f10-85934.html file must start with either used/new and end with a integer.html
no spaces are allowed.
After i validate, i want to get the ID. which in this case is 85934
Seems like you want something like this,
'~^for-sale/stuff/\S+/(?:used|new)\S*?(\d+)\.html$~'
DEMO
I'd suggest this sample piece of code and the following regex:
$re = "~\\bfor\\-sale\\/stuff\\/[^<> ]*?\\/(?:used|new)[^/ ]*?\\-(\\d+)\\.html\\b~";
$str = "\n";
preg_match_all($re, $str, $matches);
Regex: \bfor\-sale\/stuff\/[^<> ]*?\/(?:used|new)[^/ ]*?\-(\d+)\.html\b
I assume you have several URLs to validate in a variable string of text, thus I sugget using \b, and that the URL is inside some tag, so I'd use [^<> ]*? in order to limit capturing to just inside a tag.
The ID will be in the first capturing group (captured by \d+).
Spaces are also disallowed: [^<> ]*?, [^/ ]*?.

url parameters regex

I've created my own newsletter module and come across one (big) problem.
The system formats all urls with additional parameters to keep track of the clicks in google analytics.
e.g.
A url like this
http://www.domain.com
becomes like this
http://www.domain.com/&utm_source=newsletter&utm_medium=e-mail&utm_campaign=test
and a url like this
http://www.domain.com/?page=1
becomes like this
http://www.domain.com/?page=1&utm_source=newsletter&utm_medium=e-mail&utm_campaign=test
The first example is bogus. I know the first ampersand has to be replaced by an ampersand and that's where the problem occurs.
I'm using this pattern to extract url's
$pattern = array('#[a-zA-Z]+://([-]*[.]?[a-zA-Z0-9_/-?&%\{\}])*#');
$replace = array('\\0&utm_source=newsletter&utm_medium=e-mail&utm_campaign=test');
$body = preg_replace($pattern,$replace,$body);
Can anybody help me with a correct and working regex, so the first url parameter always contains a questionmark in stead of an ampersand?
just use
if(strpos($string,'?') !== false)
//add with ampersand
else
//add with question mark
Not regex, but it would work. All it does is check for a ? and if it isn't found, change the first & to a question mark.:
$url = (substr_count($url, '?')>0) ? $url : str_replace('&', '?', $url, 1);
A very simple approach would be to look for a string like http://...& where the ... contains no ? question mark or other delimiters:
= preg_replace('#(http://[^\s"\'<>?&]+)&#', '$1?', $src);
But it's probably best if you use a restricted instead of a negated character class:
$src = preg_replace('#(http://[\w/.]+)&#', '$1?', $src);
This solution fixes all urls which have a query beginning with a & (and are missing the ?):
$re = '%([a-zA-Z]+://[^?&\s]+)&(utm_source=newsletter)%';
$body = preg_replace($re, '$1?$2', $body);

preg_replace with URL problem

I use a preg_replace to auto insert HTML links within paragraphs.
Here's what I currently use:
$pattern = "~(?!(?:[^<\[]+[>\]]|[^>\]]+<\/a>))(".preg_quote($find_keyword, '/').")\b~msUi";
$replacement = "\$0";
$article_content = preg_replace($pattern, $replacement, stripslashes($article_content), 1, $added );
It works great, except 1 problem:
It doesn't match and replace if the keyword is a URL.
If: $find_keyword="http://www.mysite.com/" it won't come up with any matches even though it's in the content.
I already tried escaping $find_keyword with preg_quote, which didn't make any different.
Any regex experts know a solution? Thanks.
The forward slashes in your $find_keywords are not escaped which is breaking the pattern.
You can run your find_keyword through
$find_keyword=preg_quote("http://www.mysite.com/", '/');
http://www.php.net/manual/en/function.preg-quote.php

preg_replace on the matches of another preg_replace

I have a feeling that I might be missing something very basic. Anyways heres the scenario:
I'm using preg_replace to convert ===inputA===inputB=== to inputA
This is what I'm using
$new = preg_replace('/===(.*?)===(.*?)===/', '$1', $old);
Its working fine alright, but I also need to further restrict inputB so its like this
preg_replace('/[^\w]/', '', every Link or inputB);
So basically, in the first code, where you see $2 over there I need to perform operations on that $2 so that it only contains \w as you can see in the second code. So the final result should be like this:
Convert ===The link===link's page=== to The link
I have no idea how to do this, what should I do?
Although there already is an accepted answer: this is what the /e modifier or preg_replace_callback() are for:
echo preg_replace(
'/===(.*?)===(.*?)===/e',
'"$1"',
'===inputA===in^^putB===');
//Output: inputA
Or:
function _my_url_func($vars){
return ''.$vars[2].'';
}
echo preg_replace_callback(
'/===(.*?)===(.*?)===/',
'_my_url_func',
'===inputA===inputB===');
//Output: inputB
Try preg_match on the first one to get the 2 matches into variables, and then use preg_replace() on the one you want further checks on?
Why don't you do extract the matches from the first regex (preg_match) and treat thoses results and then put them back in a HTML form ?

Categories