Regex not quite right - php

I have a site crawler which displays a list of urls, but the problem is I cannot for the life of me get the last regex quite right.
all urls end up listed as:
The Urls can all be different and the only thing which seems static is the & symbol.
How would go abouts getting rid of the & symbol and everything beyond it to the right?
Here is what I have tried with the above results:
function getresults($sterm) {
$html = file_get_html($sterm);
$result = "";
// find all span tags with class=gb1
foreach($html->find('h3[class="r"]') as $ef)
$result .= $ef->outertext . '<br>';
return $result;
function geturl($url) {
$var = $url;
$result = "";
preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\/url?q=\']+".
$var, $matches);
$matches = $matches[1];
foreach($matches as $var)
$result .= $var."<br>";
echo preg_replace('/sa=U.*?usg=.*?AFQjCN/', "--" , $result);

if url are ALWAYS in the same format, use explode :
$tmp = explode("&", "");
$tmp[0] should content "" and
$tmp[1] should content "--E5WRBxuTOQikDIyBczaVXveOdRFg"

A simple way to remove everything after the & character:
$result = substr($result, 0, strpos($result, '&'));


Regex for add GET param to URL

I want to add GET parameter to all URLs in special string (like html content of a website) .
For example :
$content = '... register ... login ...';
$content = '... register ... login ...';
I think that this is only done with a regular expression , for this reason I wrote this function :
function makeLinks($str)
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '$1?wid=${wid}', $str);
return $str;
But this pattern having problems! for example : =>${wid}?foo=bar
Please help me.
Try This:
function makeLinks($str)
$str = preg_replace_callback('/\b((?:https?|ftp):\/\/(?:[-A-Z0-9.]+)(?:\/[-A-Z0-9+&##\/%=~_|!:,.;]*)?)(?:\?([A-Z0-9+&##\/%=~_|!:,.;]*))?/i', 'modify_url', $str);
return $str;
function modify_url($matches) {
$query = isset($matches[2]) ? $matches[2]:'';
$result = $matches[1].'?'.$query;
if(!empty($query)) $result .= '&';
return $result.'wid=${wid}';
Optionally you can just add # without affecting the ending result. I hate to use them, but here it is in case you want to use them:
function modify_url($matches) {
$result = $matches[1].'?'.#$matches[2];
if(!#empty($matches[2])) $result .= '&';
return $result.'wid=${wid}';
Idealy, you should extract the urls and parse them, but this solution should work.
I think there may be a short way. My solution :
function makeLinks($str) {
preg_match_all('|(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))|', $str, $urls);
if ($urls && isset($urls[1])) {
foreach ($urls[1] as $url) {
$new_url = $url . (strpos($url, '?') ? '&' : '?') . 'wid=${wid}';
$str = str_replace($url, $new_url, $str);
return $str;

Extracting Twitter hashtag from string in PHP

I need some help with twitter hashtag, I need to extract a certain hashtag as string variable in PHP.
Until now I have this
$hash = preg_replace ("/#(\\w+)/", "#$1", $tweet_text);
but this just transforms hashtag_string into link
Use preg_match() to identify the hash and capture it to a variable, like so:
$string = 'Tweet #hashtag';
preg_match("/#(\\w+)/", $string, $matches);
$hash = $matches[1];
var_dump( $hash); // Outputs 'hashtag'
I think this function will help you:
echo get_hashtags($string);
function get_hashtags($string, $str = 1) {
$i = 0;
if ($str) {
foreach ($matches[1] as $match) {
$count = count($matches[1]);
$keywords .= "$match";
if ($count > $i) $keywords .= ", ";
} else {
foreach ($matches[1] as $match) {
$keyword[] = $match;
$keywords = $keyword;
return $keywords;
As i understand you are saying that
in text/pargraph/post you want to show tag with hash sign(#) like this:- #tag
and in url you want to remove # sign because the string after # is not sended to server in request so i have edited your code and try out this:-
$string=" is best #SocialNetworking #website";
$text=preg_replace('/#(\\w+)/','<a href=/hash/$1>$0</a>',$string);
echo $text; // output will be is best <a href=search/SocialNetworking>#SocialNetworking</a> <a href=/search/website>#website</a>
Extract multiple hashtag to array
$body = 'My #name is #Eminem, I am rap #god, #Yoyoya check it #out';
$hashtag_set = [];
$array = explode('#', $body);
foreach ($array as $key => $row) {
$hashtag = [];
if (!empty($row)) {
$hashtag = explode(' ', $row);
$hashtag_set[] = '#' . $hashtag[0];
You can use preg_match_all() PHP function
preg_match_all('/(?<!\w)#\w+/', $description, $allMatches);
will give you only hastag array
preg_match_all('/#(\w+)/', $description, $allMatches);
will give you hastag and without hastag array
You can extract a value in a string with preg_match function
preg_match("/#(\w+)/", $tweet_text, $matches);
$hash = $matches[1];
preg_match will store matching results in an array. You should take a look at the doc to see how to play with it.
Here's a non Regex way to do it:
$tweet = "Foo bar #hashTag hello world";
$hashPos = strpos($tweet,'#');
$hashTag = '';
while ($tweet[$hashPos] !== ' ') {
$hashTag .= $tweet[$hashPos++];
echo $hashTag;
Note: This will only pickup the first hashtag in the tweet.

How to wrap user mentions in a HTML link on PHP?

Im working on a commenting web application and i want to parse user mentions (#user) as links. Here is what I have so far:
$text = "#user is not #user1 but #user3 is #user4";
$pattern = "/\#(\w+)/";
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','",$matches[1]). "')
$users = $this->getQuery($sql);
foreach($users as $i=>$u){
$text = str_replace("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
$echo $text;
The problem is that user links are being overlapped:
<a rel="11327" class="ct-userLink" href="#">
<a rel="21327" class="ct-userLink" href="#">#user</a>1
How can I avoid links overlapping?
Answer Update
Thanks to the answer picked, this is how my new foreach loop looks like:
foreach($users as $i=>$u){
$text = preg_replace("/#".$u['username']."\b/",
"<a href='#' title='{$u['user_id']}'>#{$u['username']}</a> ", $text);
Problem seems to be that some usernames can encompass other usernames. So you replace user1 properly with <a>user1</a>. Then, user matches and replaces with <a><a>user</a>1</a>. My suggestion is to change your string replace to a regex with a word boundary, \b, that is required after the username.
The Twitter widget has JavaScript code to do this. I ported it to PHP in my WordPress plugin. Here's the relevant part:
function format_tweet($tweet) {
// add #reply links
$tweet_text = preg_replace("/\B[#@]([a-zA-Z0-9_]{1,20})/",
"#<a class='atreply' href='$1'>$1</a>",
// make other links clickable
$matches = array();
$link_info = preg_match_all("/\b(((https*\:\/\/)|www\.)[^\"\']+?)(([!?,.\)]+)?(\s|$))/",
$tweet_text, $matches, PREG_SET_ORDER);
if ($link_info) {
foreach ($matches as $match) {
$http = preg_match("/w/", $match[2]) ? 'http://' : '';
$tweet_text = str_replace($match[0],
"<a href='" . $http . $match[1] . "'>" . $match[1] . "</a>" . $match[4],
return $tweet_text;
instead of parsing for '#user' parse for '#user ' (with space in the end) or ' #user ' to even avoid wrong parsing of email addresses (eg: maybe ' #user: ' should also be allowed. this will only work, if usernames have no whitespaces...
You can go for a custom str replace function which stops at first replace.. Something like ...
function str_replace_once($needle , $replace , $haystack){
$pos = strpos($haystack, $needle);
if ($pos === false) {
// Nothing found
return $haystack;
return substr_replace($haystack, $replace, $pos, strlen($needle));
And use it like:
foreach($users as $i=>$u){
$text = str_replace_once("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
You shouldn’t replace one certain user mention at a time but all at once. You could use preg_split to do that:
// split text at mention while retaining user name
$parts = preg_split("/#(\w+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$n = count($parts);
// $n is always an odd number; 1 means no match found
if ($n > 1) {
// collect user names
$users = array();
for ($i=1; $i<$n; $i+=2) {
$users[$parts[$i]] = '';
// get corresponding user information
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','", array_keys($users)). "')";
$users = array();
foreach ($this->getQuery($sql) as $user) {
$users[$user['username']] = $user;
// replace mentions
for ($i=1; $i<$n; $i+=2) {
$u = $users[$parts[$i]];
$parts[$i] = "<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a>";
// put everything back together
$text = implode('', $parts);
I like dnl solution of parsing ' #user', but maybe is not suitable for you.
Anyway, did you try to use strip_tags function to remove the anchor tags? That way you have the string without the links, and you can parse it building the links again.

remove a part of a URL argument string in php

I have a string in PHP that is a URI with all arguments:
$string =
I want to completely remove an argument and return the remain string. For example I want to remove arg3 and end up with:
$string =
I will always want to remove the same argument (arg3), and it may or not be the last argument.
EDIT: there might be a bunch of wierd characters in arg3 so my prefered way to do this (in essence) would be:
$newstring = remove $_GET["arg3"] from $string;
There's no real reason to use regexes here, you can use string and array functions instead.
You can explode the part after the ? (which you can get using substr to get a substring and strrpos to get the position of the last ?) into an array, and use unset to remove arg3, and then join to put the string back together.:
$string = "";
$pos = strrpos($string, "?"); // get the position of the last ? in the string
$query_string_parts = array();
foreach (explode("&", substr($string, $pos + 1)) as $q)
list($key, $val) = explode("=", $q);
if ($key != "arg3")
// keep track of the parts that don't have arg3 as the key
$query_string_parts[] = "$key=$val";
// rebuild the string
$result = substr($string, 0, $pos + 1) . join($query_string_parts);
See it in action at
preg_replace("arg3=[^&]*(&|$)", "", $string)
I'm assuming the url itself won't contain arg3= here, which in a sane world should be a safe assumption.
$new = preg_replace('/&arg3=[^&]*/', '', $string);
This should also work, taking into account, for example, page anchors (#) and at least some of those "weird characters" you mention but don't seem worried about:
function remove_query_part($url, $term)
$query_str = parse_url($url, PHP_URL_QUERY);
if ($frag = parse_url($url, PHP_URL_FRAGMENT)) {
$frag = '#' . $frag;
parse_str($query_str, $query_arr);
$new = '?' . http_build_query($query_arr) . $frag;
return str_replace(strstr($url, '?'), $new, $url);
$string[] = '';
$string[] = '';
$string[] = '';
$string[] = '';
$string[] = '';
$string[] = '';
$string[] = '';
foreach ($string as $str) {
echo remove_query_part($str, 'arg3') . "\n";
Tested only as shown.

PHP Remove URL from string

If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
return cleaner( implode(' ',$U));
return implode(' ',$U);
$url = "Here is another funny site and and";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
return cleaner( implode(' ',$U));
return implode(' ',$U);
$url = "Here is another funny site badurl.badone but this and and";
echo "Cleaned: " . cleaner($url);
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
$new_string .= " ".$value;
return $new_string;
You would need to write a regular expression to extract out the urls.
