need regular expression for li - php

How can I get the Strings between my li tags in php? I have tried many php code but it does not work.
<li class="release">
<strong>Release info:</strong>
<div>
How.to.Train.Your.Dragon.2.2014.All.BluRay.Persian
</div>
<div>
How.to.Train.Your.Dragon.2.2014.1080p.BRRip.x264.DTS-JYK
</div>
<div>
How.to.Train.Your.Dragon.2.2014.720p.BluRay.x264-SPARKS
</div>
</li>

you can try this
$myPattern = "/<li class=\"release\">(.*?)<\/li>/s";
$myText = '<li class="release">*</li>';
preg_match($myPattern,$myText,$match);
echo $match[1];

You don't need a regular expression. It seems to be a common mistake to use regular expressions to parse HTML code (I took the URL from T.J. Crowder comment).
Use a tool to parse HTML, for instance: DOM library.
This is a solution to get all strings (I'm assuming those are the values of the text nodes):
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//li//text()');
$strings = array();
foreach($nodes as $node) {
$string = trim($node->nodeValue);
if( $string !== '' ) {
$strings[] = trim($node->nodeValue);
}
}
print_r($strings); outputs:
Array
(
[0] => Release info:
[1] => How.to.Train.Your.Dragon.2.2014.All.BluRay.Persian
[2] => How.to.Train.Your.Dragon.2.2014.1080p.BRRip.x264.DTS-JYK
[3] => How.to.Train.Your.Dragon.2.2014.720p.BluRay.x264-SPARKS
)

Related

Match multiple results single line php regex

I would like to match multiple results on a single line string but I am only able to get the last iteration on the result I excpected.
For example I have this string : <ul><li>test1</li><li>test2</li>test3</li></ul>
I would like to get :
test1
test2
test3
As result but I only get "test3"
I used this regex <ul>(<li><a.*>(.*)<\/a><\/li>)*<\/ul> on : https://regex101.com/ but I don't know what I did wrong.
Use a parser instead:
<?php
$html = <<<DATA
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
DATA;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($dom);
$links = $xpath->query("//li/a");
foreach ($links as $link) {
echo $link->textContent;
}
?>
This sets up the DOM and uses an xpath expression to get the element(s).
Try like this:
(?<=(<a href="#">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))
or
(?<=(">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))
or
(?<=(<a href="#">))(.)+?(?=(<\/a>))
link with example:
https://regex101.com/r/MHnxxh/1
or
https://regex101.com/r/MHnxxh/2
<?php
$str = '
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
';
preg_match_all('/(?<=(#">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))/', $str, $matches);
// display array if need
echo "<pre>";
print_r($matches);
// display list
foreach ($matches[0] as $key => $value) {
echo $value ."\r\n";
}
?>
preg_match_all("\#\"\>[a-z]\w+\<\/\a\>,
$out, PREG_PATTERN_ORDER)
this the regex pattern....try this
("#\">[a-z]\w+\</\a>)
this will extract only all text strings....
you cane use of preg_replace
$test = '<ul><li>test1</li><li>test2</li>test3</li></ul>';
echo preg_replace('/<[^>]*>/', ' ', $test);

Regular expression match using preg match all in php

I have a string like this
<b>12345</b> - John George<br><span>some_text1</span>
<b>67890</b> - George Jerry<br><span>some_text2</span>
Using preg_match_all (PHP) I want to be able to extract the url, id and name but I`m not figured out the good sPattern (see bellow):
$sPattern = "/<a href=\"(.*?)\"><b>(.*?)<\/b>\" - (.*?)\"<br>(.*?)/";
preg_match_all($sPattern, $content, $aMatch);
I humbly suggest use an HTML Parser like DOMDocument instead:
$html = '<b>12345</b> - John George<br><span>some_text1</span>
<b>67890</b> - George Jerry<br><span>some_text2</span>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$anchors = $dom->getElementsByTagName('a');
$data = array();
foreach($anchors as $anchor) {
$href = $anchor->nodeValue; // get the anchor href
$b = $anchor->firstChild->nodeValue; // get the b tag value
$data[] = array('href' => $href, 'id' => $b);
}
echo '<pre>';
print_r($data);
Probably better if you write a bit more specific patterns, try this one:
$sPattern = "/<a href=\"([ˆ"]+)\"><b>(\d+)<\/b> - ((\w+ )*\w+)<br><span>([^<]+)<\/span><\/a>/";

Extract Text from within tags using RegExp PHP

I am trying to extract some strings from the source code of a web page which looks like this :
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
I'm pretty sure those strings are the only things that end with a single line break(). Everything else ends with two or more line breaks. I tried using this :
preg_match_all('~(.*?)<br />{1}~', $source, $matches);
But it doesn't work like it's supposed to. It returns some other text too along with those strings.
DOMDocument and XPath to the rescue.
$html = <<<EOM
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
EOM;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
foreach ($xp->query('//p[contains(concat(" ", #class, " "), " someclass ")]') as $node) {
echo $node->textContent;
}
Demo
I wouldn't recommend using a regular expression to get the values. Instead, use PHP's built in HTML parser like this:
$dom = new DOMDocument();
$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//p[#class="someclass"]');
$text = array(); // to hold the strings
if (!is_null($elements)) {
foreach ($elements as $element) {
$text[] = strip_tags($element->nodeValue);
}
}
print_r($text); // print out all the strings
This is tested and working. You can read more about the PHP's DOMDocument class here: http://www.php.net/manual/en/book.dom.php
Here's a demonstration: http://phpfiddle.org/lite/code/0nv-hd6 (click 'Run')
Try this:
preg_match_all('~^(.*?)<br />$~m', $source, $matches);
Should work. Please try it
preg_match_all("/([^<>]*?)<br\s*\/?>/", $source, $matches);
or if your strings may contain some HTML code, use this one:
preg_match_all("/(.*?)<br\s*\/?>\\n/", $source, $matches);

get several links inside specific div with one regex

Take a look at this html:
<div class="foo">link1link2</div>
<div class="bar">barlink</div>
I would like to know if I can loop in all links inside foo with a regular expression within php.
I tried this but isn't working:
preg_match_all(
'#<div.*?class="foo".*?<a.*?>(?P<text>.*?)</a>#xi',
$text,
$matches,
PREG_SET_ORDER
);
sadly, in this case, it must be regex, not xml or other parsers.
DON'T USE REGEX TO PARSE HTML.
<?php
$content =
'<div class="foo">
link1
link2
</div>
<div class="bar">
barlink
</div>';
$dom = new DOMDocument();
$dom->loadHTML($content);
$divs = $dom->getElementsByTagName('div');
foreach($divs as $div)
{
$classes = explode(' ', $div->getAttribute('class'));
if(in_array('foo', $classes) || trim($div->getAttribute('class')) === 'foo')
{
foreach($div->getElementsByTagName('a') as $link)
{
echo $dom->saveXML($link);
}
}
}
?>
This will output all matching links under any div with class 'foo'.
Regular Expressions should NOT be used to parse HTML, since HTML itself is not a regular language. It can get very sloppy and you can end up with more problems than what you started with, especially when you could potentially be dealing with malformed HTML.

Extract all images from a Joomla article

I have this code that extracts the first image from an article in joomla:
<?php preg_match('/<img (.*?)>/', $this->article->text, $match); ?>
<?php echo $match[0]; ?>
Is there a way to extract all the images that are available in the article and not only one?
I may suggest first to not use Regular Expressions to parse HTML. You should use an appropiate parser such as DOMDocument::loadHTML which uses libxml.
Then you may query for the desired tags you want. Something like this may work (untested):
$doc = new DOMDocument;
$doc->loadHTML($htmlSource);
$xpath = new DOMXPath($doc);
$query = '//img';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
// $entry->getAttribute('src')
}
Use preg_match_all. And you'll want to modify the pattern like so to take into account the trailing '/' inside the img tag.
$str = '<img src="asdf" />stuff more stuff <img src="qwerty" />';
preg_match_all('/<img (.*?)\/>/', $str, $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => <img src="asdf" />
[1] => <img src="qwerty" />
)
[1] => Array
(
[0] => src="asdf"
[1] => src="qwerty"
)
)

Categories