url parameters regex - php

I've created my own newsletter module and come across one (big) problem.
The system formats all urls with additional parameters to keep track of the clicks in google analytics.
e.g.
A url like this
http://www.domain.com
becomes like this
http://www.domain.com/&utm_source=newsletter&utm_medium=e-mail&utm_campaign=test
and a url like this
http://www.domain.com/?page=1
becomes like this
http://www.domain.com/?page=1&utm_source=newsletter&utm_medium=e-mail&utm_campaign=test
The first example is bogus. I know the first ampersand has to be replaced by an ampersand and that's where the problem occurs.
I'm using this pattern to extract url's
$pattern = array('#[a-zA-Z]+://([-]*[.]?[a-zA-Z0-9_/-?&%\{\}])*#');
$replace = array('\\0&utm_source=newsletter&utm_medium=e-mail&utm_campaign=test');
$body = preg_replace($pattern,$replace,$body);
Can anybody help me with a correct and working regex, so the first url parameter always contains a questionmark in stead of an ampersand?

just use
if(strpos($string,'?') !== false)
//add with ampersand
else
//add with question mark

Not regex, but it would work. All it does is check for a ? and if it isn't found, change the first & to a question mark.:
$url = (substr_count($url, '?')>0) ? $url : str_replace('&', '?', $url, 1);

A very simple approach would be to look for a string like http://...& where the ... contains no ? question mark or other delimiters:
= preg_replace('#(http://[^\s"\'<>?&]+)&#', '$1?', $src);
But it's probably best if you use a restricted instead of a negated character class:
$src = preg_replace('#(http://[\w/.]+)&#', '$1?', $src);

This solution fixes all urls which have a query beginning with a & (and are missing the ?):
$re = '%([a-zA-Z]+://[^?&\s]+)&(utm_source=newsletter)%';
$body = preg_replace($re, '$1?$2', $body);

Related

preg_replace twitter url not working with question mark php

I have created the next function to replace an url by a div with its id.
function twitterIzer($string){
$pattern = '~https?://twitter\.com/.*?/status/(\d+)~';
$string = preg_replace($pattern, "<div class='tweet' id='tweet$1' tweetid='$1'></div>", $string);
return $string;
}
It works well when I use this type of url
https://twitter.com/Minsa_Peru/status/1260658846143401984
but it retrieve an excedent ?s=20 when I use this url
https://twitter.com/Minsa_Peru/status/1262730246668922885?s=20
How can I remove this ?s=20 text, in order to make work my function ? Anything I know is I need to improve my regex pattern. thank you.
If you want just regex:
$pattern = '/https?:\/\/twitter\.com\/.*?\/status\/(\d+)(.*)?/';
Because ? is not a digit so it will seperate with (.*), this mean every thing rest and in this case is ?s=xyz, last question mark ? is to say that is can exist or not.
Learn regex

How to use a variable as a pattern along with the other patterns in preg_match() function?

Actually I'm writing a web crawler for my mini project.
I want to crawl only those web pages that belong to the input website only. I want my web crawler not to crawl to other websites other than the input given for now.
This is what I'm doing:
$url = $_POST["url"];
$web = #file_get_contents($url);
preg_match_all("/<a\s.*href=\"(.*)\"/U", $web, $matches);
What I want to do is:
$url = $_POST["url"];
$web = #file_get_contents($url);
preg_match_all("/<a\s.*href=\"(.*$url.*)\"/U", $web, $matches);
for example:
Input: https://www.google.com/
then the regular expression should be :
preg_match("/.*google.com.*/U", xyz, xyz);
Any other suggestions will be helpful, thanks in advance.
Change your delimiters to something that is not in any of your URLs?
preg_match_all("#<a\s.*href=\"(.*$url.*)\"#U", $web, $matches);
edit
Probably better to escape the $url with preg_quote
I found the solution, here's the solution.
If you want to use a variable along with the regular expression.
preg_match("/regular_expression".($my_variable)."regular_expression/U", $source, $matches);
The real solution is to use a preg_quote with the actual regex delimiter and append the part to the regex literal parts with the dot syntax:
preg_match_all("/<a\s.*href=\"(.*" . preg_quote($url, "/") . ".*)\"/U", $web, $matches);
^ ^^^^^^^^^^ ^^^ ^
The dots are like + in some other languages used for string concatenation, and preg_quote will make sure all special regex metacharacters in the variable string are properly escaped.

PHP- Parsing words from a string without spaces?

My webpage has a variable, $currentPage. This is a string of the php token name of the page I'm currently on.
Example: All categories under the user section have names such as:
uAdminNew, uAdminEdit, ect..
I would like for a way to parse out the uAdmin and just determine what is the last word (New and Edit) and call upon functions from there.
I have my navigation system working through these names, therefore I can't change the names or I would to make it easier to parse. Such as adding delimiters.
Is this something only Regex can solve or is there a simpler solution I'm missing? If this is Regex could you explain or provide a link as to how I would go about using it to test against a specific list of strings? I'm very new to it.
For example, so:
$str = 'uAdminEdit';
$ar = preg_match('/([A-Z][^A-Z]+$)/', $str, $m);
echo $m[1]; // Edit
Does the pagename always start with uAdmin? If so, you could split the string by "uAdmin" with explode():
$page = 'uAdminEdit';
echo explode('uAdmin', $page)[1]; //Output: Edit
Or simply remove "uAdmin" with str_replace():
$page = 'uAdminEdit';
echo str_replace('uAdmin', '', $page); //Output: Edit
If you just want the section after uAdmin, use the regex capture groups
preg_match('/uAdmin(.*)/', $sub, $matches);
echo $matches[1]

Extracting text from URL using PHP

I'm curious as to how I would get a certain value after a delimiter in a URL?
If I have a URL of http://www.testing.site.com/site/biz/i-want-this, how would I extract only the part that says "i-want-this", or initially after the last /?
Thank you!
You want basename($path); It should give you what you need:
http://www.ideone.com/8hFSN
$url = "http://www.testing.site.com/site/biz/i-want-this";
preg_match( "/[^\/]*$/", $url, $match);
echo $match[0]; // i-want-this
You can use basename() but if you are on Windows, it will break on not just slashes but also backslashes. This is unlikely to come up as backslashes are unusual in a URL. But I suspect you could find them in a query string in a valid URL.

Extracting URLs from a JSON-like string

I need to extract the first URL from some content. The content may be like this:
({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});
or may contain only a link
({items:[{url:"http://portlandor.ebayclassifieds.com/",name:"Portland (OR)"}],error:null});
currently I have :
$pattern = "/\:\[\{url\:\"(.*)\"\,name/";
preg_match_all($pattern, $htmlContent, $matches);
$URL = $matches[1][0];
however it works only if there is a single link so I need a regex which should work for the both cases.
You can use this REGEX:
$pattern = "/url\:\"([^\"]+)\"/";
Worked for me :)
Hopefully this should work for you
<?php
$str = '({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});'; //The string you want to extract the 1st URL from
$match = ""; //Define the match variable
preg_match("%(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&\%\$#\=~_\-]+))*%",$str,$match); //I Googled for the best Regular expression for URLs and found the one included in the preg_match
echo $match[0]; //Return the first item in the array (the first URL returned)
?>
This is the website that I found the regular expression on: http://regexlib.com/Search.aspx?k=URL
like the others have said, json_decode should work for you aswell
That smells like JSON to me. Try using http://php.net/json_decode
Looks like JSON to me, visit http://php.net/manual/en/book.json.php and use json_decode().

Categories