I'm trying to basically extract the ?v= (the query part) of the youtube.com url... it's to automatically embed the video when someone types in a youtube.com URI (i.e. someone will type in http://www.youtube.com/?v=xyz, this program should embed it into the page automatically).
Anyway when I run the following code, I get two QUERY(ies) for the first URI:
<?php
//REGEX CONTROLLER:
//embedding youtube:
function youtubeEmbedd($text)
{
//scan text and find:
// http://www.youtube.com/
// www.youtube.com/
$youtube_pattern = "(http\:\/\/www\.youtube\.com\/(watch)??\?v\=[a-zA-Z0-9]+(\&[a-z]\=[a-zA-Z0-9])*?)"; // the pattern
#"http://www.youtube.com/?v="
echo "<hr/>";
$links = preg_match_all($youtube_pattern, $text, $out, PREG_SET_ORDER); // use preg_replace here
if ($links)
{
for ($i = 0; $i != count($out); $i++)
{
echo "<b><u> URL </b><br/></u> ";
foreach ($out[$i] as $url)
{
// split url[QUERY] here and replaces it with embed code:
$youtube = parse_url($url);
echo "QUERY: " . $youtube["query"] . "<br/>";
#$pos = strpos($url, "?v=");
}
}
}
else
{
echo "no match";
}
}
youtubeEmbedd("tthe quick gorw fox http://www.youtube.com/watch?v=5qm8PH4xAss&x=4dD&k=58J8 and http://www.youtube.com/?v=Dd3df4e ");
?>
Output is:
URL
QUERY: v=5qm8PH4xAss
QUERY: << WHY DOES THIS APPEAR????????????
URL
QUERY: v=Dd3df4e
I would be greatful for any help.
Your regular expression stores a result in $out like this:
(
[0] => Array
(
[0] => http://www.youtube.com/watch?v=5qm8PH4xAss
[1] => watch
)
[1] => Array
(
[0] => http://www.youtube.com/?v=Dd3df4e
)
)
Your regular expression has a subgroup for matching the text watch, and so this ends up as a result in the array.
Since you iterate through all results $out[$i] you're trying to run parse_url on the second result of the first match; this leads to an empty output.
To fix your issue, simple change your iteration to something like:
if($links){
foreach($out as $result){
$youtube = parse_url($result[0]);
echo "<b><u> URL </b><br/></u> QUERY: " . $youtube["query"] . "<br/>";
}
}
Related
How to determine, using regexp or something else in PHP, that following urls match some patterns with tokens (url => pattern):
node/11221 => node/%node
node/38429/news => node/%node/news
album/34234/shadowbox/321023 => album/%album/shadowbox/%photo
Thanks in advance!
Update 1
Wrote the following script:
<?php
$patterns = [
"node/%node",
"node/%node/news",
"album/%album/shadowbox/%photo",
"media/photo",
"blogs",
"news",
"node/%node/players",
];
$url = "node/11111/news";
foreach ($patterns as $pattern) {
$result_pattern = preg_replace("/\/%[^\/]+/x", '/*', $pattern);
$to_replace = ['/\\\\\*/']; // asterisks
$replacements = ['[^\/]+'];
$result_pattern = preg_quote($result_pattern, '/');
$result_pattern = '/^(' . preg_replace($to_replace, $replacements, $result_pattern) . ')$/';
if (preg_match($result_pattern, $url)) {
echo "<pre>" . $pattern . "</pre>";
}
}
?>
Could anyone analyze whether this code is good enough? And also explain why there is so many slashes in this part $to_replace = ['/\\\\\*/']; (regarding the replacement, found exactly such solution on the Internet).
If you know the format beforehand you can use preg_match. For example in the first example, you know %node can only be numbers. Matching multiples should be as as easy as we did it earlier, just store the regex in the array:
$patterns = array(
'node/%node' => '|node/[0-9]+$|',
'node/%node/news' => '|node/[0-9]+/news|',
'album/%album/shadowbox/%photo' => '|album/[0-9]+/shadowbox/[0-9]+|',
'media/photo' => '|media/photo|',
'blogs' => '|blogs|',
'news' => '|news|',
'node/%node/players' => '|node/[0-9]+/players|',
);
$url = "node/11111/players";
foreach ($patterns as $pattern => $regex) {
preg_match($regex, $url, $results);
if (!empty($results)) {
echo "<pre>" . $pattern . "</pre>";
}
}
Notice how I added the question mark $ to end of the first rule, this will insure that it doesn't break into the second rule.
Here is the generic solution to the solution above
<?php
// The url part
$url = "/node/123/hello/strText";
// The pattern part
$pattern = "/node/:id/hello/:test";
// Replace all variables with * using regex
$buffer = preg_replace("(:[a-z]+)", "*", $pattern);
// Explode to get strings at *
// In this case ['/node/','/hello/']
$buffer = explode("*", $buffer);
// Control variables for loop execution
$IS_MATCH = True;
$CAPTURE = [];
for ($i=0; $i < sizeof($buffer); $i++) {
$slug = $buffer[$i];
$real_slug = substr($url, 0 , strlen($slug));
if (!strcmp($slug, $real_slug)) {
$url = substr($url, strlen($slug));
$temp = explode("/", $url)[0];
$CAPTURE[sizeof($CAPTURE)+1] = $temp;
$url = substr($url,strlen($temp));
}else {
$IS_MATCH = False;
}
}
unset($CAPTURE[sizeof($CAPTURE)]);
if($IS_MATCH)
print_r($CAPTURE);
else
print "Not a match";
?>
You can pretty much convert the code above into a function and pass parameters to check against the array case. The first step is regex to convert all variables into * and the explode by *. Finally loop over this array and keep comparing to the url to see if the pattern matches using simple string comparison.
As long as the pattern is fixed, you can use preg_match() function:
$urls = array (
"node/11221",
"node/38429/news",
"album/34234/shadowbox/321023",
);
foreach ($urls as $url)
{
if (preg_match ("|node/([\d]+$)|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|node/([\d]+)/news|", $url, $matches))
{
print "Node is {$matches[1]}\n";
}
elseif (preg_match ("|album/([\d]+)/shadowbox/([\d]+)$|", $url, $matches))
{
print "Album is {$matches[1]} and photo is {$matches[2]}\n";
}
}
For other patterns to match, adjust as necessary.
I want to create a PHP array like below, which contains two values, one for the matching word and one to check if the match is a link or not.
Input
$string = "test test"
What can I do here to find all matches of the word 'test' and check if the founded match is a link or not?
Output
Array
(
[0] =>
Array
(
[0] test
[1] false
)
[1] =>
Array
(
[0] test
[1] true
)
)
You could use a regular expression for this:
$string = 'test link text';
if (preg_match("#^\s*(.*?)\s*<a\s.*?href\s*=\s*['\"](.*?)['\"].*?>(.*?)</a\s*>#si",
$string, $match)) {
$textBefore = $match[1]; // test
$href = $match[2]; // http://test
$anchorText = $match[3]; // link text
// deal with these elements as you wish...
}
This solution is not case-sensitive, it will work with <A ...>...</A> just as well. If the href value is delimited with single quotes instead of double quotes, it will still work. Surrounding spaces of each value are ignored (trimmed).
Try this code :
<?php
$string ="test test";
$link ='';
$word = '';
$flag = true;
for($i=0;$i<strlen($string);$i++){
if($string[$i] == '<' && $string[$i+1] == 'a'){
$flag=false;
while($string[$i++] != '>')
{
}
while($string[$i] != '<' && $string[$i+1] != '/' && $string[$i+2] != 'a' && $string[$i+3] != '>'){
$link .= $string[$i++];
}
}
else{
if($flag)
$word.=$string[$i];
}
}
echo 'Link :'.$link . "<br/>";
echo 'Word:'.$word;
// You can now manipulate Link and word as you wish
?>
Hello ,
im using the following code to Retrieve the DOM from URL
all "A" tags and print their HREFs
Now my output is contain "A" i dont want its my out is here
http://trend.remal.com/parsing.php
i need to clear my out to be only the name after http://twitter.com/namehere
so output print list of "namehere"
include('simple_html_dom.php');
// Retrieve the DOM from a given URL
$html = file_get_html('http://tweepar.com/sa/1/');
$urls = array();
foreach ( $html->find('a') as $e )
{
// If it's a twitter link
if ( strpos($e->href, '://twitter.com/') !== false )
{
// and we don't have it in the array yet
if ( ! in_array($urls, $e->href) )
{
// add it to our array
$urls[] = $e->href;
}
}
}
echo implode('<br>', $urls);
echo $e->href . '<br>';
Instead of simply using $urls[] = $e->href, use a regex to match the username:
preg_match('~twitter.com/(.+)~', $e->href, $matches);
$urls[] = $matches[1];
I have a private website where I share videos (and some other stuff).
What I have achieved is that with preg_match_all() it automatically finds the link and it paste the video with the HTML code to my website.
Here an example:
<?php
$matchwith = "http://videosite.com/id1 http://videosite.com/id2 http://videosite.com/id3";
preg_match_all('/videosite\.com\/(\w+)/i', $matchwith, $matches);
foreach($matches[1] as $value)
{
print 'Hyperlink';
}
?>
This works. I know this could may could be done easier, but it has to be this way.
But I do not know how this with a two part movie. Here an example:
$matchWith = "http://videosite.com/id1_movie1 http://videosite.com/id2_movie1"
"http://videosite.com/id3_movie2 http://videosite.com/id4_movie2";
Everything after http://videosite.com/(...) is unique.
What I want is if you write Part 1 and Part 2 (or whatever) before the link, that it automatically detects it as Part 1 and Part 2 of this video.
$matchwith could contain different movies.
So I believe this is what you need:
<?php
$matchWith = "Movie 1 http://videosite.com/id1" . PHP_EOL .
"Movie 1 http://videosite.com/id2" . PHP_EOL .
"Movie 2 http://videosite.com/id3";
$arrLinks = array();
preg_match_all('%(.*)\shttp://videosite\.com/(\w+)\r{0,1}%', $matchWith, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
$arrLinks[$result[$matchi][1]][] = $result[$matchi][2];
}
foreach ($arrLinks as $movieName => $arrMovieIds) {
print '<div>' . $movieName . '</div>';
foreach ($arrMovieIds as $movieId) {
print 'Hyperlink<br/>';
}
}
?>
$matchwith = "Part 1 http://videosite.com/id1-1 Part2 http://videosite.com/id1-2";
preg_match_all('/videosite\.com\/(\w+-\d+)/i', $matchwith, $matches);
foreach($matches[1] as $value)
{
print 'Hyperlink';
}
I want to replace
{youtube}Video_ID_Here{/youtube}
with the embed code for a youtube video.
So far I have
preg_replace('/{youtube}(.*){\/youtube}/iU',...)
and it works just fine.
But now I'd like to be able to interpret parameters like height, width, etc. So could I have one regex for this whether is does or doesn't have parameters? It should be able to inperpret all of these below...
{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}
{youtube height="200px"}Video_ID_Here{/youtube}
{youtube}Video_ID_Here{/youtube}
{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}
Try this:
function createEmbed($videoID, $params)
{
// $videoID contains the videoID between {youtube}...{/youtube}
// $params is an array of key value pairs such as height => 200px
return 'HTML...'; // embed code
}
if (preg_match_all('/\{youtube(.*?)\}(.+?)\{\/youtube\}/', $string, $matches)) {
foreach ($matches[0] as $index => $youtubeTag) {
$params = array();
// break out the attributes
if (preg_match_all('/\s([a-z0-9]+)="([^\s]+?)"/', $matches[1][$index], $rawParams)) {
for ($x = 0; $x < count($rawParams[0]); $x++) {
$params[$rawParams[1][$x]] = $rawParams[2][$x];
}
}
// replace {youtube}...{/youtube} with embed code
$string = str_replace($youtubeTag, createEmbed($matches[2][$index], $params), $string);
}
}
this code matches the {youtube}...{/youtube} tags first and then splits out the attributes into an array, passing both them (as key/value pairs) and the video ID to a function. Just fill in the function definition to make it validate the params you want to support and build up the appropriate HTML code.
You probably want to use preg_replace_callback, as the replacing can get quite convoluted otherwise.
preg_replace_callback('/{youtube(.*)}(.*){\/youtube}/iU',...)
And in your callback, check $match[1] for something like the /(width|showborder|height|color1)="([^"]+)"/i pattern. A simple preg_match_all inside a preg_replace_callback keeps all portions nice & tidy and above all legible.
I would do it something like this:
preg_match_all("/{youtube(.*?)}(.*?){\/youtube}/is", $content, $matches);
for($i=0;$i<count($matches[0]);$i++)
{
$params = $matches[1][$i];
$youtubeurl = $matches[2][$i];
$paramsout = array();
if(preg_match("/height\s*=\s*('|\")([0-9]+px)('|\")/i", $params, $match)
{
$paramsout[] = "height=\"{$match[2]}\"";
}
//process others
//setup new code
$tagcode = "<object ..." . implode(" ", $paramsout) ."... >"; //I don't know what the code is to display a youtube video
//replace original tag
$content = str_replace($matches[0][$i], $tagcode, $content);
}
You could just look for params after "{youtube" and before "}" but you open yourself up to XSS problems. The best way would be look for a specific number of parameters and verify them. Don't allow things like < and > to be passed inside your tags as someone could put do_something_nasty(); or something.
I'd not use regex at all, since they are notoriously bad at parsing markup.
Since your input format is so close to HTML/XML in the first place, I'd rely on that
$tests = array(
'{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}'
, '{youtube height="200px"}Video_ID_Here{/youtube}'
, '{youtube}Video_ID_Here{/youtube}'
, '{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}'
, '{YOUTUBE width="150px" showborder="1"}Video_ID_Here{/youtube}' // deliberately invalid
);
echo '<pre>';
foreach ( $tests as $test )
{
try {
$youtube = SimpleXMLYoutubeElement::fromUserInput( $test );
print_r( $youtube );
}
catch ( Exception $e )
{
echo $e->getMessage() . PHP_EOL;
}
}
echo '</pre>';
class SimpleXMLYoutubeElement extends SimpleXMLElement
{
public static function fromUserInput( $code )
{
$xml = #simplexml_load_string(
str_replace( array( '{', '}' ), array( '<', '>' ), strip_tags( $code ) ), __CLASS__
);
if ( !$xml || 'youtube' != $xml->getName() )
{
throw new Exception( 'Invalid youtube element' );
}
return $xml;
}
public function toEmbedCode()
{
// write code to convert this to proper embode code
}
}