regex match span or a? - php

I have this function:
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
if(count($matches) > 0){
return $matches[1];
}
}
Passing for example span as parameter $tagname allows me to match any span tags. I would expect passing a|span would allow me to mach any a or span tags. But it doesn't match anything. Why?

Try with grouping in parenthesis (as we talk in comments):
function getTextBetweenTags($string, $tagname) {
$pattern = "/<($tagname)?.*>(.*)<\/($tagname)>/";
preg_match($pattern, $string, $matches);
if(count($matches) > 0){
return $matches[1];
}
}
If you pass $string = "a|span" you'll obtain $pattern = "/<(a|span)>";

Related

Grab URL within a string which contains HTML code

I have a string, for example:
$html = '<p>helloworld</p><p>helloworld</p>';
And I want to search the string for the first URL that starts with youtube.com or youtu.be and store it in variable $first_found_youtube_url.
How can I do this efficiently?
I can do a preg_match or strpos looking for the urls but not sure which approach is more appropriate.
I wrote this function a while back, it uses regex and returns an array of unique urls. Since you want the first one, you can just use the first item in the array.
function getUrlsFromString($string) {
$regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
preg_match_all($regex, $string, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
Example:
$html = '<p>helloworld</p><p>helloworld</p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];
With YouTube specific regex:
function getYoutubeUrlsFromString($string) {
$regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
preg_match_all($regex, $string, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
Example:
$html = '<p>helloworld</p><p>helloworld</p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];
you can parse the html with DOMDocument and look for youtube url's with stripos, something like this
$html = '<p>helloworld</p><p>helloworld</p>';
$DOMD = #DOMDocument::loadHTML($html);
foreach($DOMD->getElementsByTagName("a") as $url)
{
if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
{
$first_found_youtube_url = $url->getAttribute("href");
break;
}
}
personally, i would probably use
"youtube.com"===parse_url($url->getAttribute("href"),PHP_URL_HOST)
though, as it would get http AND https links.. which is probably what you want, though strictly speaking, not what you're asking for in top post right now..
I think this will do what you are looking for, I have used preg_match_all simply because I find it easier to debug the regexes.
<?php
$html = '<p>helloworld</p><p>helloworld</p>';
$pattern = '/https?:\/\/(www\.)?youtu(\.be|\com)\/[a-zA-Z0-9\?=]*/i';
preg_match_all($pattern, $html, $matches);
// print_r($matches);
$first_found_youtube = $matches[0][0];
echo $first_found_youtube;
demo - https://3v4l.org/lFjmK

Checking pattern as much as possible

How to make preg find all possible solutions for regular expression pattern?
Here's the code:
<?php
$text = 'Amazing analyzing.';
$regexp = '/(^|\\b)([\\S]*)(a)([\\S]*)(\\b|$)/ui';
$matches = array();
if (preg_match_all($regexp, $text, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
echo "{$match[2]}[{$match[3]}]{$match[4]}\n";
}
}
?>
Output:
Am[a]zing
an[a]lyzing.
Output that i need:
[A]mazing
Am[a]zing
[A]nalyzing.
an[a]lyzing.
You have to use look behind/ahead zero-length assertions (instead of a normal pattern which consumes the characters around what your are looking for): http://www.regular-expressions.info/lookaround.html
Lookaround assertions won't help, for two reasons:
Since they are zero-length, they won't return characters that you need.
As Avinash Raj noted, PHP lookbehind doesn't allow *.
This yields the output that you need:
<?php
$text = 'Amazing analyzing.';
foreach (preg_split('/\s+/', $text) as $word)
{
$matches = preg_split('/(a)/i', $word, 0, PREG_SPLIT_DELIM_CAPTURE);
for ($match = 1; $match < count($matches); $match += 2)
{
$prefix = join(array_slice($matches, 0, $match));
$suffix = join(array_slice($matches, $match+1));
echo "{$prefix}[{$matches[$match]}]{$suffix}\n";
}
}
?>

Ucfirst all strings within <strong> tag

I am trying to Ucfirst all strings within <strong> in a sentence. Tried this without any luck:
function getTextBetweenTags($string, $tagname)
{
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return ucfirst($matches[1]);
}
$sentence = "Yellow pitty lies <strong>about</strong> the life.";
$finalsentence = getTextBetweenTags($sentence,"strong");
What is the correct way to do that ?
There is a simpler way. Instead of using php you could use only css, for instance:
strong:first-letter{
text-transform: capitalize
}
You need to include matching for the text before and after the tags.
function getTextBetweenTags($string, $tagname)
{
$pattern = "/(.*<$tagname>)(.*?)(<\/$tagname>.*)/";
preg_match($pattern, $string, $matches);
return $matches[1] . ucfirst($matches[2]) . $matches[3];
}
$sentence = "Yellow pitty lies <strong>about</strong> the life.";
$finalsentence = getTextBetweenTags($sentence,"strong");

How do I use a part of preg_replace pattern as a variable?

function anchor($text)
{
return preg_replace('#\>\>([0-9]+)#','<span class=anchor>>>$1</span>', $text);
}
This piece of code is used to render page anchor.
I need to use the
([0-9]+)
part as a variable to do some maths to define the exact url for the href tag.
Thanks.
Use preg_replace_callback instead.
In php 5.3 +:
$matches = array();
$text = preg_replace_callback(
$pattern,
function($match) use (&$matches){
$matches[] = $match[1];
return '<span class=anchor><a href="#$1">'.$match[1].'</span>';
}
);
In php <5.3 :
global $matches;
$matches = array();
$text = preg_replace_callback(
$pattern,
create_function('$match','global $matches; $matches[] = $match[1]; return \'<span class=anchor><a href="#$1">\'.$match[1].\'</span>\';')
);

PHP/regex: How to get the string value of HTML tag?

I need help on regex or preg_match because I am not that experienced yet with regards to those so here is my problem.
I need to get the value "get me" but I think my function has an error.
The number of html tags are dynamic. It can contain many nested html tag like a bold tag. Also, the "get me" value is dynamic.
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
That should do the trick
Try this
$str = '<option value="123">abc</option>
<option value="123">aabbcc</option>';
preg_match_all("#<option.*?>([^<]+)</option>#", $str, $foo);
print_r($foo[1]);
In your pattern, you simply want to match all text between the two tags. Thus, you could use for example a [\w\W] to match all characters.
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname>([\w\W]*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
Since attribute values may contain a plain > character, try this regular expression:
$pattern = '/<'.preg_quote($tagname, '/').'(?:[^"'>]*|"[^"]*"|\'[^\']*\')*>(.*?)<\/'.preg_quote($tagname, '/').'>/s';
But regular expressions are not suitable for parsing non-regular languages like HTML. You should better use a parser like SimpleXML or DOMDocument.
this might be old but my answer might help someone
You can simply use
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
echo strip_tags($str);
https://www.php.net/manual/en/function.strip-tags.php
$userinput = "http://www.example.vn/";
//$url = urlencode($userinput);
$input = #file_get_contents($userinput) or die("Could not access file: $userinput");
$regexp = "<tagname\s[^>]*>(.*)<\/tagname>";
//==Example:
//$regexp = "<div\s[^>]*>(.*)<\/div>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
// $match[2] = link address
// $match[3] = link text
}
}
try $pattern = "<($tagname)\b.*?>(.*?)</\1>" and return $matches[2]
The following php snippets would return the text between html tags/elements.
regex : "/tagname(.*)endtag/" will return text between tags.
i.e.
$regex="/[start_tag_name](.*)[/end_tag_name]/";
$content="[start_tag_name]SOME TEXT[/end_tag_name]";
preg_replace($regex,$content);
It will return "SOME TEXT".
Your HTML
$html='<ul id="main">
<li>
<h1>My Title</h1>
<span class="date">Date</span>
<div class="section">
[content]
</div>
</li>
</ul>';
//function call you can change the tag name
echo contentBetweenTags($html,"span");
// this function will help you to fetch the data from a specific tag
function contentBetweenTags($content, $tagname){
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match($pattern, $content, $matches);
if(empty($matches))
return;
$str = "<$tagname>".html_entity_decode($matches[1])."</$tagname>";
return $str;
}

Categories