I have this function:
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
if(count($matches) > 0){
return $matches[1];
Passing for example span as parameter $tagname allows me to match any span tags. I would expect passing a|span would allow me to mach any a or span tags. But it doesn't match anything. Why?
Try with grouping in parenthesis (as we talk in comments):
function getTextBetweenTags($string, $tagname) {
$pattern = "/<($tagname)?.*>(.*)<\/($tagname)>/";
preg_match($pattern, $string, $matches);
if(count($matches) > 0){
return $matches[1];
If you pass $string = "a|span" you'll obtain $pattern = "/<(a|span)>";
I have a string, for example:
$html = '<p>helloworld</p><p>helloworld</p>';
And I want to search the string for the first URL that starts with youtube.com or youtu.be and store it in variable $first_found_youtube_url.
How can I do this efficiently?
I can do a preg_match or strpos looking for the urls but not sure which approach is more appropriate.
I wrote this function a while back, it uses regex and returns an array of unique urls. Since you want the first one, you can just use the first item in the array.
function getUrlsFromString($string) {
$regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#i';
preg_match_all($regex, $string, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
return $matches;
$html = '<p>helloworld</p><p>helloworld</p>';
$urls = getUrlsFromString($html);
$first_found_youtube = $urls[0];
With YouTube specific regex:
function getYoutubeUrlsFromString($string) {
$regex = '#(https?:\/\/(?:www\.)?(?:youtube.com\/watch\?v=|youtu.be\/)([a-zA-Z0-9]*))#i';
preg_match_all($regex, $string, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
return $matches;
$html = '<p>helloworld</p><p>helloworld</p>';
$urls = getYoutubeUrlsFromString($html);
$first_found_youtube = $urls[0];
you can parse the html with DOMDocument and look for youtube url's with stripos, something like this
$html = '<p>helloworld</p><p>helloworld</p>';
$DOMD = #DOMDocument::loadHTML($html);
foreach($DOMD->getElementsByTagName("a") as $url)
if (0 === stripos($url->getAttribute("href") , "https://www.youtube.com/") || 0 === stripos($url->getAttribute("href") , "https://www.youtu.be"))
$first_found_youtube_url = $url->getAttribute("href");
personally, i would probably use
though, as it would get http AND https links.. which is probably what you want, though strictly speaking, not what you're asking for in top post right now..
I think this will do what you are looking for, I have used preg_match_all simply because I find it easier to debug the regexes.
$html = '<p>helloworld</p><p>helloworld</p>';
$pattern = '/https?:\/\/(www\.)?youtu(\.be|\com)\/[a-zA-Z0-9\?=]*/i';
preg_match_all($pattern, $html, $matches);
// print_r($matches);
$first_found_youtube = $matches[0][0];
echo $first_found_youtube;
demo - https://3v4l.org/lFjmK
How to make preg find all possible solutions for regular expression pattern?
Here's the code:
$text = 'Amazing analyzing.';
$regexp = '/(^|\\b)([\\S]*)(a)([\\S]*)(\\b|$)/ui';
$matches = array();
if (preg_match_all($regexp, $text, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
echo "{$match[2]}[{$match[3]}]{$match[4]}\n";
Output that i need:
You have to use look behind/ahead zero-length assertions (instead of a normal pattern which consumes the characters around what your are looking for): http://www.regular-expressions.info/lookaround.html
Lookaround assertions won't help, for two reasons:
Since they are zero-length, they won't return characters that you need.
As Avinash Raj noted, PHP lookbehind doesn't allow *.
This yields the output that you need:
$text = 'Amazing analyzing.';
foreach (preg_split('/\s+/', $text) as $word)
$matches = preg_split('/(a)/i', $word, 0, PREG_SPLIT_DELIM_CAPTURE);
for ($match = 1; $match < count($matches); $match += 2)
$prefix = join(array_slice($matches, 0, $match));
$suffix = join(array_slice($matches, $match+1));
echo "{$prefix}[{$matches[$match]}]{$suffix}\n";
I am trying to Ucfirst all strings within <strong> in a sentence. Tried this without any luck:
function getTextBetweenTags($string, $tagname)
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return ucfirst($matches[1]);
$sentence = "Yellow pitty lies <strong>about</strong> the life.";
$finalsentence = getTextBetweenTags($sentence,"strong");
What is the correct way to do that ?
There is a simpler way. Instead of using php you could use only css, for instance:
text-transform: capitalize
You need to include matching for the text before and after the tags.
function getTextBetweenTags($string, $tagname)
$pattern = "/(.*<$tagname>)(.*?)(<\/$tagname>.*)/";
preg_match($pattern, $string, $matches);
return $matches[1] . ucfirst($matches[2]) . $matches[3];
$sentence = "Yellow pitty lies <strong>about</strong> the life.";
$finalsentence = getTextBetweenTags($sentence,"strong");
function anchor($text)
return preg_replace('#\>\>([0-9]+)#','<span class=anchor>>>$1</span>', $text);
This piece of code is used to render page anchor.
I need to use the
part as a variable to do some maths to define the exact url for the href tag.
Use preg_replace_callback instead.
In php 5.3 +:
$matches = array();
$text = preg_replace_callback(
function($match) use (&$matches){
$matches[] = $match[1];
return '<span class=anchor><a href="#$1">'.$match[1].'</span>';
In php <5.3 :
global $matches;
$matches = array();
$text = preg_replace_callback(
create_function('$match','global $matches; $matches[] = $match[1]; return \'<span class=anchor><a href="#$1">\'.$match[1].\'</span>\';')
I need help on regex or preg_match because I am not that experienced yet with regards to those so here is my problem.
I need to get the value "get me" but I think my function has an error.
The number of html tags are dynamic. It can contain many nested html tag like a bold tag. Also, the "get me" value is dynamic.
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
That should do the trick
Try this
$str = '<option value="123">abc</option>
<option value="123">aabbcc</option>';
preg_match_all("#<option.*?>([^<]+)</option>#", $str, $foo);
In your pattern, you simply want to match all text between the two tags. Thus, you could use for example a [\w\W] to match all characters.
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname>([\w\W]*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
Since attribute values may contain a plain > character, try this regular expression:
$pattern = '/<'.preg_quote($tagname, '/').'(?:[^"'>]*|"[^"]*"|\'[^\']*\')*>(.*?)<\/'.preg_quote($tagname, '/').'>/s';
But regular expressions are not suitable for parsing non-regular languages like HTML. You should better use a parser like SimpleXML or DOMDocument.
this might be old but my answer might help someone
You can simply use
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
echo strip_tags($str);
$userinput = "http://www.example.vn/";
//$url = urlencode($userinput);
$input = #file_get_contents($userinput) or die("Could not access file: $userinput");
$regexp = "<tagname\s[^>]*>(.*)<\/tagname>";
//$regexp = "<div\s[^>]*>(.*)<\/div>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
// $match[2] = link address
// $match[3] = link text
try $pattern = "<($tagname)\b.*?>(.*?)</\1>" and return $matches[2]
The following php snippets would return the text between html tags/elements.
regex : "/tagname(.*)endtag/" will return text between tags.
$content="[start_tag_name]SOME TEXT[/end_tag_name]";
It will return "SOME TEXT".
$html='<ul id="main">
<h1>My Title</h1>
<span class="date">Date</span>
<div class="section">
//function call you can change the tag name
echo contentBetweenTags($html,"span");
// this function will help you to fetch the data from a specific tag
function contentBetweenTags($content, $tagname){
$pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
preg_match($pattern, $content, $matches);
$str = "<$tagname>".html_entity_decode($matches[1])."</$tagname>";
return $str;