I use the following code
foreach ($twitter_xml2->channel->item as $key) {
$author = $key->{"guid"};
echo"<li><h5>$author</h5></li>";
}
and it gets me http://twitter.com/USERNAME/statuses/167382363782206976
My question is how do I get only the username ?
Username may be anything
$url = "http://twitter.com/USERNAME/statuses/167382363782206976"
preg_match("#http://twitter.com/([^\/]+)/statuses/.*#", $url, $matches);
var_dump($matches[1]);
You can use either this regexp (for preg_match): ~twitter\.com/([^/]+)/~:
$match = array();
preg_match( '~twitter\.com/([^/]+)/~', $url, $match);
echo $match[1]; // list(,$userName) = $match;
Or more effective strpos and substr
$start = strpos( $url, '/', 10); // say 10th character is after http:// and before .com/
$end = strpos( $url, '/', $start+1); // This would be the end
// Check both idexes
$username = substr( $url, $start, $end-$start);
// you will maybe have to fix indexes +/-1
foreach ($twitter_xml2->channel->item as $key) {
$author = $key->{"guid"};
list(,,,$username) = explode('/', $author);
echo"<li><h5>$username</h5></li>";
}
Related
How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
I have a string which contains a url. I am trying to extract the url from the additional text in the most efficient way. So far I have been using explode but I have to explode twice and then rebuild the url. Regex is not something I dominate yet so i placed it out of the question(unless it is the best solution). Is there a way to extract the url in one step?
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$strip1 = explode( '&', $url );
$strip2 = explode('=', $strip1[0]);
$result = $strip2[1].'='.$strip2[2];
result:
http://www.somesite.com/sites/pages/page?id=1545778
Try like this:use preg_split()
$date = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$t =preg_split("/[=&]/", $date);
echo $t[1]."=".$t[2]; //output: http://www.somesite.com/sites/pages/page?id1545778
$strip1 = explode( '/url?q=', $url );
Use this regex $strip1
^((http):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$
you will get an array of sections in the url
Ugly one-step regex-free solution.
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$result = substr( $url, strpos( $url, '=' ) + 1, strpos( $url, '&' ) - strpos( $url, '=' ) - 1 );
echo $result;
And cleaner two-step variation.
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$start = strpos( $url, '=' ) + 1;
$result = substr( $url, $start, strpos( $url, '&' ) - $start );
echo $result;
Somewhat less-ugly regex solution.
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$result = preg_replace( '/[^=]*=([^&]*).*/', '${1}', $url );
echo $result;
Both produce the following output.
http://www.somesite.com/sites/pages/page?id=1545778
Technically, that second ? in the URL should be URL encoded, but we can get around that. Use parse_url to get the query, then replace ? with a URL encoded version using str_replace. After this, you will have a valid query that you can parse using parse_str.
$query = parse_url($url, PHP_URL_QUERY);
$query = str_replace("?", urlencode("?"), $query);
parse_str($query, $params);
echo $params['q'];
// displays http://www.somesite.com/sites/pages/page?id=1545778
$url = "/url?q=http://www.somesite.com/sites/pages/page?id=1545778&sa=U&ei=EhHLVL_yJcb-yQSZ7oDgAg&ved=0CBMQFjAA&usg";
$strip3 = current(explode('&',end(explode('=', $url,2))));
print_r ($strip3); //output http://www.somesite.com/sites/pages/page?id=1545778
how to get id url with preg_replace.
this is the link:
http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id how do I get the id? in the case would be: 5b87f8eaa7c20f79c3257eb3ec0a35e0
In this case I recommend not to use preg_match (preg_replace would be used to replace something.
Simply use
$array = explode('/',$_SERVER['REQUEST_URI']);
$id = $array[1];
If you must use preg_match:
$array = array();
preg_match('#^/photo/([0-9a-f]{32})/id$#',$_SERVER['REQUEST_URI'],$array);
$id = $array[1];
You can do this easily using strripos to find the last / in the URL.
$url = $_SERVER['REQUEST_URI'];
if (($pos = strripos($url, '/')) !== false) {
$id = substr($url, $pos + 1);
}
else {
trigger_error('You must supply a valid photo ID');
}
If you would like to just extract that id string, you can use:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
preg_match($pattern, $id_url, $output_array);
echo $output_array[1];
Or, to make the replacement:
$id_url = "http://www.DDDD.com.br/photo/5b87f8eaa7c20f79c3257eb3ec0a35e0/id";
$pattern = "/photo\/([a-zA-Z0-9]*)/";
$replacement = "your replacement";
$replaced_url = preg_replace($pattern, $replacement, $id_url);
echo $replaced_url;
PHP Live Regex - a useful tool for testing your patterns
please help me strip the following more efficiently.
a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"
the site I visit has a few of those, I would only need everything in between the two periods:
vFIsdfuIHq4gpAnc
I would like to use my current format and coding that works around the regex environment. Please help me tune up my following preg match line:
preg_match_all("(./(.*?).html)", $sp, $content);
Any kind of help I get on this is greatly appreciated and thank you in advance!
Here is my complete code
$dp = "http://www.cnn.com";
$sp = #file_get_contents($dp);
if ($sp === FALSE) {
echo("<P>Error: unable to read the URL $dp. Process aborted.</P>");
exit();
}
preg_match_all("(./(.*?).html)", $sp, $content);
foreach($content[1] as $surl) {
$nctid = str_replace("mv/","",$surl);
$nctid = str_replace("/","",$nctid);
echo $nctid,'<br /><br /><br />';
the above is what I have been working on
It's pretty okay, really. It's just that you don't want to match .*?, you want to match multiple characters that aren't a full stop, so you can use [^.]+ instead.
$sp = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
preg_match_all( '/\.([^.]+).html/', $sp, $content );
var_dump( $content[1] );
The result that is printed:
array(1) {
[0]=>
string(16) "vFIsdfuIHq4gpAnc"
}
Here's an example of how to loop through all links:
<?php
$url = 'http://www.cnn.com';
$dom = new DomDocument( );
#$dom->loadHTMLFile( $url );
$links = $dom->getElementsByTagName( 'a' );
foreach( $links as $link ) {
$href = $link->attributes->getNamedItem( 'href' );
if( $href !== null ) {
if( preg_match( '~mv/.*?([^.]+).html~', $href->nodeValue, $matches ) ) {
echo "Link-id found: " . $matches[1] . "\n";
}
}
}
You can use explode():
$string = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
if(stripos($string, '/mv/')){
$dots = explode('.', $string);
echo $dots[(count($dots)-2)];
}
How about using explode?
$exploded = explode('.', $sp);
$content = $exploded[1]; // string: "vFIsdfuIHq4gpAnc"
even more simpler
$sp="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html";
$regex = '/\.(?P<value>.*)\./';
preg_match_all($regex, $sp, $content);
echo nl2br(print_r($content["value"], 1));
I have a string in PHP that is a URI with all arguments:
$string = http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0
I want to completely remove an argument and return the remain string. For example I want to remove arg3 and end up with:
$string = http://domain.com/php/doc.php?arg1=0&arg2=1
I will always want to remove the same argument (arg3), and it may or not be the last argument.
Thoughts?
EDIT: there might be a bunch of wierd characters in arg3 so my prefered way to do this (in essence) would be:
$newstring = remove $_GET["arg3"] from $string;
There's no real reason to use regexes here, you can use string and array functions instead.
You can explode the part after the ? (which you can get using substr to get a substring and strrpos to get the position of the last ?) into an array, and use unset to remove arg3, and then join to put the string back together.:
$string = "http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0";
$pos = strrpos($string, "?"); // get the position of the last ? in the string
$query_string_parts = array();
foreach (explode("&", substr($string, $pos + 1)) as $q)
{
list($key, $val) = explode("=", $q);
if ($key != "arg3")
{
// keep track of the parts that don't have arg3 as the key
$query_string_parts[] = "$key=$val";
}
}
// rebuild the string
$result = substr($string, 0, $pos + 1) . join($query_string_parts);
See it in action at http://www.ideone.com/PrO0a
preg_replace("arg3=[^&]*(&|$)", "", $string)
I'm assuming the url itself won't contain arg3= here, which in a sane world should be a safe assumption.
$new = preg_replace('/&arg3=[^&]*/', '', $string);
This should also work, taking into account, for example, page anchors (#) and at least some of those "weird characters" you mention but don't seem worried about:
function remove_query_part($url, $term)
{
$query_str = parse_url($url, PHP_URL_QUERY);
if ($frag = parse_url($url, PHP_URL_FRAGMENT)) {
$frag = '#' . $frag;
}
parse_str($query_str, $query_arr);
unset($query_arr[$term]);
$new = '?' . http_build_query($query_arr) . $frag;
return str_replace(strstr($url, '?'), $new, $url);
}
Demo:
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0';
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1';
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0#frag';
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0&arg4=4';
$string[] = 'http://domain.com/php/doc.php';
$string[] = 'http://domain.com/php/doc.php#frag';
$string[] = 'http://example.com?arg1=question?mark&arg2=equal=sign&arg3=hello';
foreach ($string as $str) {
echo remove_query_part($str, 'arg3') . "\n";
}
Output:
http://domain.com/php/doc.php?arg1=0&arg2=1
http://domain.com/php/doc.php?arg1=0&arg2=1
http://domain.com/php/doc.php?arg1=0&arg2=1#frag
http://domain.com/php/doc.php?arg1=0&arg2=1&arg4=4
http://domain.com/php/doc.php
http://domain.com/php/doc.php#frag
http://example.com?arg1=question%3Fmark&arg2=equal%3Dsign
Tested only as shown.