please help me strip the following more efficiently.
a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"
the site I visit has a few of those, I would only need everything in between the two periods:
vFIsdfuIHq4gpAnc
I would like to use my current format and coding that works around the regex environment. Please help me tune up my following preg match line:
preg_match_all("(./(.*?).html)", $sp, $content);
Any kind of help I get on this is greatly appreciated and thank you in advance!
Here is my complete code
$dp = "http://www.cnn.com";
$sp = #file_get_contents($dp);
if ($sp === FALSE) {
echo("<P>Error: unable to read the URL $dp. Process aborted.</P>");
exit();
}
preg_match_all("(./(.*?).html)", $sp, $content);
foreach($content[1] as $surl) {
$nctid = str_replace("mv/","",$surl);
$nctid = str_replace("/","",$nctid);
echo $nctid,'<br /><br /><br />';
the above is what I have been working on
It's pretty okay, really. It's just that you don't want to match .*?, you want to match multiple characters that aren't a full stop, so you can use [^.]+ instead.
$sp = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
preg_match_all( '/\.([^.]+).html/', $sp, $content );
var_dump( $content[1] );
The result that is printed:
array(1) {
[0]=>
string(16) "vFIsdfuIHq4gpAnc"
}
Here's an example of how to loop through all links:
<?php
$url = 'http://www.cnn.com';
$dom = new DomDocument( );
#$dom->loadHTMLFile( $url );
$links = $dom->getElementsByTagName( 'a' );
foreach( $links as $link ) {
$href = $link->attributes->getNamedItem( 'href' );
if( $href !== null ) {
if( preg_match( '~mv/.*?([^.]+).html~', $href->nodeValue, $matches ) ) {
echo "Link-id found: " . $matches[1] . "\n";
}
}
}
You can use explode():
$string = 'a href="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html"';
if(stripos($string, '/mv/')){
$dots = explode('.', $string);
echo $dots[(count($dots)-2)];
}
How about using explode?
$exploded = explode('.', $sp);
$content = $exploded[1]; // string: "vFIsdfuIHq4gpAnc"
even more simpler
$sp="/mv/test-1-2-3-4.vFIsdfuIHq4gpAnc.html";
$regex = '/\.(?P<value>.*)\./';
preg_match_all($regex, $sp, $content);
echo nl2br(print_r($content["value"], 1));
Related
How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
I have a markdown text content which I have to replace without using library functions.So I used preg replace for this.It works fine for some cases.For cases like heading
for eg Heading
=======
should be converted to <h1>Heading</h1> and also
##Sub heading should be converted to <h2>Sub heading</h2>
###Sub heading should be converted to <h3>Sub heading</h3>
I have tried
$text = preg_replace('/##(.+?)\n/s', '<h2>$1</h2>', $text);
The above code works but I need to have count of hash symbol and based on that I have to assign heading tags.
Anyone help me please....
Try using preg_replace_callback.
Something like this -
$regex = '/(#+)(.+?)\n/s';
$line = "##Sub heading\n ###sub-sub heading\n";
$line = preg_replace_callback(
$regex,
function($matches){
$h_num = strlen($matches[1]);
return "<h$h_num>".$matches[2]."</h$h_num>";
},
$line
);
echo $line;
The output would be something like this -
<h2>Sub heading</h2> <h3>sub-sub heading</h3>
EDIT
For the combined problem of using = for headings and # for sub-headings, the regex gets a bit more complicated, but the principle remains the same using preg_replace_callback.
Try this -
$regex = '/(?:(#+)(.+?)\n)|(?:(.+?)\n\s*=+\s*\n)/';
$line = "Heading\n=======\n##Sub heading\n ###sub-sub heading\n";
$line = preg_replace_callback(
$regex,
function($matches){
//var_dump($matches);
if($matches[1] == ""){
return "<h1>".$matches[3]."</h1>";
}else{
$h_num = strlen($matches[1]);
return "<h$h_num>".$matches[2]."</h$h_num>";
}
},
$line
);
echo $line;
Whose Output is -
<h1>Heading</h1><h2>Sub heading</h2> <h3>sub-sub heading</h3>
Do a preg_match_all like this:
$string = "#####asdsadsad";
preg_match_all("/^#/", $string, $matches);
var_dump ($matches);
And based on count of matches you can do whatever you want.
Or, use the preg_replace_callback function.
$input = "#This is my text";
$pattern = '/^(#+)(.+)/';
$mytext = preg_replace_callback($pattern, 'parseHashes', $input);
var_dump($mytext);
function parseHashes($input) {
var_dump($input);
$matches = array();
preg_match_all('/(#)/', $input[1], $matches);
var_dump($matches[0]);
var_dump(count($matches[0]));
$cnt = count($matches[0]);
if ($cnt <= 6 && $cnt > 0) {
return '<h' . $cnt . ' class="if you want class here">' . $input[2] . '</h' . $cnt . '>';
} else {
//This is not a valid h tag. Do whatever you want.
return false;
}
}
It might seem easy to do but I have trouble extracting this string. I have a string that has # tags in it and I'm trying to pull the tags maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa
And here is what I want to extract 33.536759,-7.613825,17z :
$var = preg_match_all("/#(\w*)/",$path,$query);
Any way I can do this? Much appreciated.
Change your regex to this one: /#([\w\d\.\,-]*)/.
This will return the string beginning with #.
$string = 'maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa';
$string = explode('/',$string);
//$coordinates = substr($string[3], 1);
//print_r($coordinates);
foreach ($string as $substring) {
if (substr( $substring, 0, 1 ) === "#") {
$coordinates = $substring;
}
}
echo $coordinates;
This is working for me:
$path = "maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa";
$var = preg_match_all("/#([^\/]+)/",$path,$query);
print $query[1][0];
A regex would do.
/#(-*\d+\.\d+),(-*\d\.\d+,\d+z*)/
If there is only one # and the string ends with / you can use the following code:
//String
$string = 'maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa';
//Save string after the first #
$coordinates = strstr($string, '#');
//Remove #
$coordinates = str_replace('#', '', $coordinates);
//Separate string on every /
$coordinates = explode('/', $coordinates );
//Save first part
$coordinates = $coordinates[0];
//Do what you want
echo $coordinates;
do like this
$re = '/#((.*?),-(.*?),)/mi';
$str = 'maps/place/Residences+Jardins+de+Majorelle/#33.536759,-7.613825,17z/data=!3m1!4b1!4m2!3m1!1s0xda62d6053931323:0x2f978f4d1aabb1aa';
preg_match_all($re, $str, $matches);
echo $matches[2][0].'<br>';
echo $matches[3][0];
output
33.536759
7.613825
I have the following code:
$regex='|<a.*?href="(.*?)"|'; //PARSE FOR LINKS
preg_match_all($regex,$result,$parts);
$links=$parts[1];
foreach($links as $link){
echo $link."<br>";
}
Its output is the following:
/watch/b4se39an
/watch/b4se39an
/bscsystem
/watch/ifuyzwfw
/watch/ifuyzwfw
/?sort=v
/?sort=c
/?sort=l
/watch/xk4mvavj
/watch/2h7b53vx
/watch/d7bt47xb
/watch/yh953b17
/watch/tj3z6ki2
/watch/sd4vraxi
/watch/f2rnthuh
/watch/ey6z8hxa
/watch/ybgxgay1
/watch/3iaqyrm1
/help/feedback
How I can use a regular expression to extract the /watch/..... strings?
Modify your regex to include the restriction on /watch/:
$regex = '|<a.*?href="(/watch/.*?)"|';
A simple test script can show that it's working:
$tests = array( "/watch/something", "/bscsystem");
$regex = '|<a.*?href="(/watch/.*?)"|';
foreach( $tests as $test) {
$link = '';
if( preg_match( $regex, $link))
echo $test . ' matched.<br />';
}
This will produce:
/watch/something matched.
I use the following code
foreach ($twitter_xml2->channel->item as $key) {
$author = $key->{"guid"};
echo"<li><h5>$author</h5></li>";
}
and it gets me http://twitter.com/USERNAME/statuses/167382363782206976
My question is how do I get only the username ?
Username may be anything
$url = "http://twitter.com/USERNAME/statuses/167382363782206976"
preg_match("#http://twitter.com/([^\/]+)/statuses/.*#", $url, $matches);
var_dump($matches[1]);
You can use either this regexp (for preg_match): ~twitter\.com/([^/]+)/~:
$match = array();
preg_match( '~twitter\.com/([^/]+)/~', $url, $match);
echo $match[1]; // list(,$userName) = $match;
Or more effective strpos and substr
$start = strpos( $url, '/', 10); // say 10th character is after http:// and before .com/
$end = strpos( $url, '/', $start+1); // This would be the end
// Check both idexes
$username = substr( $url, $start, $end-$start);
// you will maybe have to fix indexes +/-1
foreach ($twitter_xml2->channel->item as $key) {
$author = $key->{"guid"};
list(,,,$username) = explode('/', $author);
echo"<li><h5>$username</h5></li>";
}