Get two results without repeating preg_match and file_get_contents

Get two results without repeating preg_match and file_get_contents - php

I'm newbie to php
And I need to get two results from the same page. og:image and og:video
This my current code
preg_match('/property="og:video" content="(.*?)"/', file_get_contents($url), $matchesVideo);
preg_match('/property="og:image" content="(.*?)"/', file_get_contents($url), $matchesThumb);
$videoID = ($matchesVideo[1]) ? $matchesVideo[1] : false;
$videoThumb = ($matchesThumb[1]) ? $matchesThumb[1] : false;
Is there a way to execute the same operation without duplicating my code

Save the file contents to a variable, and if you want to run a single regular expression, you can opt for:
$file = file_get_contents($url);
preg_match_all('/property="og:(?P<type>video|image)" content="(?P<content>.*?)"/', $file, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$match['type'] ...
$match['content'] ...
}
As #hakre points out, the first parenthesis pair is not needed:
The first parenthesis pair uses the no capture modifier ?:, it causes a match but is not stored
Capture groups use named subpatterns ?P<name>, the second capture group establish any of the two words is a possible match image|video.

There is no problem with having those two lines. What I would change though is the double call to file_get_contents($url).
Just change it to:
$html = file_get_contents($url);
preg_match('/property="og:video" content="(.*?)"/', $html, $matchesVideo);
preg_match('/property="og:image" content="(.*?)"/', $html, $matchesThumb);

Is there a way to execute the same operation without duplicating my code
There are always two ways to do that:
Buffer an execution result - instead of executing multiple times.
Encode the repetition - extract parameters from code.
In programming you normally make use of both. For example the buffering of the file I/O operation:
$buffer = file_get_contents($url);
And for the matching, you encode the repetition:
$match = function ($what) use ($buffer) {
$pattern = sprintf('/property="og:%s" content="(.*?)"/', $what);
$result = preg_match($pattern, $buffer, $matches);
return $result ? $matches[1] : NULL;
}
$match('video');
$match('image');
This is only exemplary to show what I meant. It depends a bit how much you want to do this, e.g. the later allows to replace the matching with a different implementation like using a HTML parser but you might find it too much code at the moment for what you need to do and only go with the buffering.
E.g. the following could be applicable as well:
$buffer = file_get_contents($url);
$mask = '/property="og:%s" content="(.*?)"/';
preg_match(sprintf($mask, 'video'), $buffer, $matchesVideo);
preg_match(sprintf($mask, 'image'), $buffer, $matchesThumb);
Hope this helps.

Related

PHP:preg_replace function

$text = "
<tag>
<html>
HTML
</html>
</tag>
";
I want to replace all the text present inside the tags with htmlspecialchars(). I tried this:
$regex = '/<tag>(.*?)<\/tag>/s';
$code = preg_replace($regex,htmlspecialchars($regex),$text);
But it doesn't work.
I am getting the output as htmlspecialchars of the regex pattern. I want to replace it with htmlspecialchars of the data matching with the regex pattern.
what should i do?

You're replacing the match with the pattern itself, you're not using the back-references and the e-flag, but in this case, preg_replace_callback would be the way to go:
$code = preg_replace_callback($regex,'htmlspecialchars',$text);
This will pass the mathces groups to htmlspecialchars, and use its return value as replacement. The groups might be an array, in which case, you can try either:
function replaceCallback($matches)
{
if (is_array($matches))
{
$matches = implode ('', array_slice($matches, 1));//first element is full string
}
return htmlspecialchars($matches);
}
Or, if your PHP version permits it:
preg_replace_callback($expr, function($matches)
{
$return = '';
for ($i=1, $j = count($matches); $i<$j;$i++)
{//loop like this, skips first index, and allows for any number of groups
$return .= htmlspecialchars($matches[$i]);
}
return $return;
}, $text);
Try any of the above, until you find simething that works... incidentally, if all you want to remove is <tag> and </tag>, why not go for the much faster:
echo htmlspecialchars(str_replace(array('<tag>','</tag>'), '', $text));
That's just keeping it simple, and it'll almost certainly be faster, too.
See the quickest, easiest way in action here

If you want to isolate the actual contents as defined by your pattern, you could use preg_match($regex,$text,$hits);. This will give you an array of hits those bits that were between the paratheses in the pattern, starting at $hits[1], $hits[0] contains the whole matched string). You can then start manipulating these found matches, possibly using htmlspecialchars ... and combine them again into $code.

How to handle a miss with regex / PHP / preg_match_all

I'm using the code at the bottom to grab parameters from a wordpress shortcode. The shortcode itself looks like this:
[FLOWPLAYER=http://www.tvovermind.com/wp-content/uploads/2013/01/pll-316-21.jpg|http://www.tvovermind.com/wp-content/uploads/2013/01/PLL316_fv2.h264HD-Clip2.flv,440,280]
Or
[FLOWPLAYER=http://www.tvovermind.com/wp-content/uploads/2013/01/pll-316-21.jpg|http://www.tvovermind.com/wp-content/uploads/2013/01/PLL316_fv2.h264HD-Clip2.flv,440,280,false]
What I would like to have happen is that if the extra parameter (false/true) is missing then that match becomes "false", however with the current code if the parameter is missing a match is never made. Any ideas?
function legacy_hook($content){
$regex = '/\[FLOWPLAYER=([a-z0-9\:\.\-\&\_\/\|]+)\,([0-9]+)\,([0-9]+)\,([a-z0-9\:\.\-\&\_\/\|]+)\]/i';
$matches = array();
preg_match_all($regex, $content, $matches);
if($matches[0][0] != '') {
foreach($matches[0] as $key => $data) {
$content = str_replace($matches[0][$key], flowplayer::build_player($matches[2][$key], $matches[3][$key], $matches[1][$key],$matches[4][$key]),$content);
}
}
return $content;
}

your regex is looking for the last comma to be there and one or more of the characters in the last set of brackets. Something like
/\[FLOWPLAYER=([a-z0-9\:\.\-\&\_\/\|]+)\,([0-9]+)\,([0-9]+)(\,[a-z]+)?\]/i
only issue is you'll get the comma in the match too.
might be what you're after, then you have to test for the last match being present. preg_match_all returns the number of matches so you might be able to use that, or you could do an inline if...
(count($matches) > 4 ? $matches[4][$key] : false)

You can add OR at the end of your expression
(,true|,false|$)
I didn't check does it work but you get the idea.

Find links in string with PHP. Differ from normal and youtube links

I have a string that contain links. I want my php to do different things with my links, depending on the url.
Answer:
function fixLinks($text)
{
$links = array();
$text = strip_tags($text);
$pattern = '!(https?://[^\s]+)!';
if (preg_match_all($pattern, $text, $matches)) {
list(, $links) = ($matches);
}
$i = 0;
$links2 = array();
foreach($links AS $link) {
if(strpos($link,'youtube.com') !== false) {
$search = "!(http://.*youtube\.com.*v=)?([a-zA-Z0-9_-]{11})(&.*)?!";
$youtube = 'http://www.youtube.com/watch?v=\\2';
$link2 = preg_replace($search, $youtube, $link);
} else {
$link2 = preg_replace('#(https?://([-\w\.]+)+(:\d+)?(/([\-\w/_\.]*(\?\S+)?)?)?)#', '<u>$1</u>', $link);
}
$links2[$i] = $link2;
$i++;
}
$text = str_replace($links, $links2, $text);
$text = nl2br($text);
return $text;
}

First of all, ditch eregi. It's deprecated and will disappear soon.
Then, doing this in just one pass is maybe a stretch too far. I think you'll be better off splitting this into three phases.
Phase 1 runs a regex search over your input, finding everything that looks like a link, and storing it in a list.
Phase 2 iterates over the list, checking whether a link goes to youtube (parse_url is tremendously useful for this), and putting a suitable replacement into a second list.
Phase 3: you now have two lists, one containing the original matches, one containing the desired replacements. Run str_replace over your original text, providing the match list for the search parameter and the replacement list for the replacements.
There are several advantages to this approach:
The regular expression for extracting links can be kept relatively simple, since it doesn't have to take special hostnames into account
It is easier to debug; you can dump the search and replace arrays prior to phase 3, and see if they contain what you expect
Because you perform all replacements in one go, you avoid problems with overlapping matches or replacing a piece of already-replaced text (after all, the replaced text still contains a URL, and you don't want to replace that again)

tdammers' answer is good, but another option is to use preg_replace_callback. If you go with that, then the process changes a little:
Create a regular expression to match all links, same as his Phase 1
In the callback, search for the YouTube video id. This will require running a second preg_match, which is (in my opinion) the biggest problem with this technique.
Return the replacement string, based on whether or not it's YouTube.
The code would look something like this:
function replaceem($matches) {
$url = $matches[0];
preg_match('~youtube\.com.*v=([\w\-]{11})~', $url, $matches);
return isset($matches[0]) ?
'<a href="youtube.php?id='.$matches[1].'" class="fancy">'.
'http://www.youtube.com/watch?v='.$matches[1].'</a>' :
'<a href="'.$url.'" title="Åben link" alt="Åben link" '.
'target="_blank">'.$url.'</a>';
}
$text = preg_replace_callback('~(?:f|ht)tps?://[^\s]+~', 'replaceem', $text);

preg_replace inside of preg_match_all problems

I'm trying to find some certain blocks in my data file and replace something inside of them. After that put the whole thing (with replaced data) into a new file. My code at the moment looks like this:
$content = file_get_contents('file.ext', true);
//find certain pattern blocks first
preg_match_all('/regexp/su', $content, $matches);
foreach ($matches[0] as $match) {
//replace data inside of those blocks
preg_replace('/regexp2/su', 'replacement', $match);
}
file_put_contents('new_file.ext', return_whole_thing?);
Now the problem is I don't know how to return_whole_thing. Basically, file.ext and new_file.ext are almost the same except of the replaced data.
Any suggestion what should be on place of return_whole_thing?
Thank you!

You don't even need the preg_replace; because you've already got the matches you can just use a normal str_replace like so:
$content = file_get_contents('file.ext', true);
//find certain pattern blocks first
preg_match_all('/regexp/su', $content, $matches);
foreach ($matches[0] as $match) {
//replace data inside of those blocks
$content = str_replace( $match, 'replacement', $content)
}
file_put_contents('new_file.ext', $content);

It's probably best to strengthen your regular expression to find a subpattern within the original pattern. That way you can just call preg_replace() and be done with it.
$new_file_contents = preg_replace('/regular(Exp)/', 'replacement', $content);
This can be done with "( )" within the regular expression. A quick google search for "regular expression subpatterns" resulted in this.

I'm not sure I understand your problem. Could you perhaps post an example of:
file.ext, the original file
the regex you want to use and what you want to replace matches with
new_file.ext, your desired output
If you just want to read file.ext, replace a regex match, and store the result in new_file.ext, all your need is:
$content = file_get_contents('file.ext');
$content = preg_replace('/match/', 'replacement', $content);
file_put_contents('new_file.ext', $content);

PHP Split a string with start and stop value

I have fooled around with regex but can't seem to get it to work. I have a file called includes/header.php I am converting the file into one big string so that I can pull out a certain portion of the code to paste in the html of my document.
$str = file_get_contents('includes/header.php');
From here I am trying to get return only the string that starts with <ul class="home"> and ends with </ul>
try as I may to figure out an expression I am still confused.
Once I trim down the string I can just print that on my page but I can't figure out the trimming part

If you need something really hardcore, http://www.php.net/manual/en/book.xmlreader.php.
If you just want to rip out the text that fits that pattern try something like this.
$string = "stuff<ul class=\"home\">alsdkjflaskdvlsakmdf<another></another></ul>stuff";
if( preg_match( '/<ul class="home">(.*)<\/ul>/', $string, $match ) ) {
//do stuff with $match[0]
}

I'm assuming that the difficulty you're having has to do with escaping the regex special characters in the string(s) you're using as a delimiter. If so, try using the preg_quote() function:
$start = preg_quote('<ul class="home">');
$end = preg_quote('</ul>', '/');
preg_match("/" . $start. '.*' . $end . "/", $str, $matching_html_snippets);
The html you want should be in $matching_html_snippets[0]

You probably want an XML parser such as the built in one. Here is an example you might want to take a look at.
http://www.php.net/manual/en/function.xml-parse.php#90733
If you want to use regex then something along the lines of
$str = file_get_contents('includes/header.php');
$matchedstr = preg_match("<place your pattern here>", $str, $matches);
You probably want the pattern
'/<ul class="home">.*?<\/ul>/s'
Where $matches will contain an array of the matches it found so you can grab whatever element you want from the array with
$matchedstr[0];
which will return the first element. And then output that.
But I'd be a bit wary, regular expressions do tend to match to surprising edge cases and you need to feed them actual data to get reliable results as to when they are failing. However if you are just passing templates it should be ok, just do some tests and see if it all works. If not I'd still recommend using the PHP XML Parser.
Hope that helps.

If you feel like not using regexes you could use string finding, which I think the PHP manual implies is quicker:
function substrstr($orig, $startText, $endText) {
//get first occurrence of the start string
$start = strpos($orig, $startText);
//get last occurrence of the end string
$end = strrpos($orig, $endText);
if($start === FALSE || $end === FALSE)
return $orig;
$start++;
$length = $end - $start;
return substr($orig, $start, $length);
}
$substr = substrstr($string, '<ul class="home">', '</ul>');
You'll need to make some adjustments if you want to include the terminating strings in the output, but that should get you started!

Here's a novel way to do it; I make no guarantees about this technique's robustness or performance, other than it does work for the example given:
$prefix = '<ul class="home">';
$suffix = '</ul>';
$result = $prefix . array_shift(explode($suffix, array_pop(explode($prefix, $str)))) . $suffix;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get two results without repeating preg_match and file_get_contents - php

Related

PHP:preg_replace function

How to handle a miss with regex / PHP / preg_match_all

Find links in string with PHP. Differ from normal and youtube links

preg_replace inside of preg_match_all problems

PHP Split a string with start and stop value

Categories

Resources