How can I get the Strings between my li tags in php? I have tried many php code but it does not work.
<li class="release">
<strong>Release info:</strong>
<div>
How.to.Train.Your.Dragon.2.2014.All.BluRay.Persian
</div>
<div>
How.to.Train.Your.Dragon.2.2014.1080p.BRRip.x264.DTS-JYK
</div>
<div>
How.to.Train.Your.Dragon.2.2014.720p.BluRay.x264-SPARKS
</div>
</li>
you can try this
$myPattern = "/<li class=\"release\">(.*?)<\/li>/s";
$myText = '<li class="release">*</li>';
preg_match($myPattern,$myText,$match);
echo $match[1];
You don't need a regular expression. It seems to be a common mistake to use regular expressions to parse HTML code (I took the URL from T.J. Crowder comment).
Use a tool to parse HTML, for instance: DOM library.
This is a solution to get all strings (I'm assuming those are the values of the text nodes):
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//li//text()');
$strings = array();
foreach($nodes as $node) {
$string = trim($node->nodeValue);
if( $string !== '' ) {
$strings[] = trim($node->nodeValue);
}
}
print_r($strings); outputs:
Array
(
[0] => Release info:
[1] => How.to.Train.Your.Dragon.2.2014.All.BluRay.Persian
[2] => How.to.Train.Your.Dragon.2.2014.1080p.BRRip.x264.DTS-JYK
[3] => How.to.Train.Your.Dragon.2.2014.720p.BluRay.x264-SPARKS
)
Related
I would like to match multiple results on a single line string but I am only able to get the last iteration on the result I excpected.
For example I have this string : <ul><li>test1</li><li>test2</li>test3</li></ul>
I would like to get :
test1
test2
test3
As result but I only get "test3"
I used this regex <ul>(<li><a.*>(.*)<\/a><\/li>)*<\/ul> on : https://regex101.com/ but I don't know what I did wrong.
Use a parser instead:
<?php
$html = <<<DATA
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
DATA;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($dom);
$links = $xpath->query("//li/a");
foreach ($links as $link) {
echo $link->textContent;
}
?>
This sets up the DOM and uses an xpath expression to get the element(s).
Try like this:
(?<=(<a href="#">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))
or
(?<=(">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))
or
(?<=(<a href="#">))(.)+?(?=(<\/a>))
link with example:
https://regex101.com/r/MHnxxh/1
or
https://regex101.com/r/MHnxxh/2
<?php
$str = '
<ul>
<li>test1</li>
<li>test2</li>
<li>test3</li>
</ul>
';
preg_match_all('/(?<=(#">))([\s\S]| |w\[0-9]| )+?(?=(<\/a>))/', $str, $matches);
// display array if need
echo "<pre>";
print_r($matches);
// display list
foreach ($matches[0] as $key => $value) {
echo $value ."\r\n";
}
?>
preg_match_all("\#\"\>[a-z]\w+\<\/\a\>,
$out, PREG_PATTERN_ORDER)
this the regex pattern....try this
("#\">[a-z]\w+\</\a>)
this will extract only all text strings....
you cane use of preg_replace
$test = '<ul><li>test1</li><li>test2</li>test3</li></ul>';
echo preg_replace('/<[^>]*>/', ' ', $test);
I have a string like this
<b>12345</b> - John George<br><span>some_text1</span>
<b>67890</b> - George Jerry<br><span>some_text2</span>
Using preg_match_all (PHP) I want to be able to extract the url, id and name but I`m not figured out the good sPattern (see bellow):
$sPattern = "/<a href=\"(.*?)\"><b>(.*?)<\/b>\" - (.*?)\"<br>(.*?)/";
preg_match_all($sPattern, $content, $aMatch);
I humbly suggest use an HTML Parser like DOMDocument instead:
$html = '<b>12345</b> - John George<br><span>some_text1</span>
<b>67890</b> - George Jerry<br><span>some_text2</span>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$anchors = $dom->getElementsByTagName('a');
$data = array();
foreach($anchors as $anchor) {
$href = $anchor->nodeValue; // get the anchor href
$b = $anchor->firstChild->nodeValue; // get the b tag value
$data[] = array('href' => $href, 'id' => $b);
}
echo '<pre>';
print_r($data);
Probably better if you write a bit more specific patterns, try this one:
$sPattern = "/<a href=\"([ˆ"]+)\"><b>(\d+)<\/b> - ((\w+ )*\w+)<br><span>([^<]+)<\/span><\/a>/";
I am trying to extract some strings from the source code of a web page which looks like this :
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
I'm pretty sure those strings are the only things that end with a single line break(). Everything else ends with two or more line breaks. I tried using this :
preg_match_all('~(.*?)<br />{1}~', $source, $matches);
But it doesn't work like it's supposed to. It returns some other text too along with those strings.
DOMDocument and XPath to the rescue.
$html = <<<EOM
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
EOM;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
foreach ($xp->query('//p[contains(concat(" ", #class, " "), " someclass ")]') as $node) {
echo $node->textContent;
}
Demo
I wouldn't recommend using a regular expression to get the values. Instead, use PHP's built in HTML parser like this:
$dom = new DOMDocument();
$dom->loadHTML($source);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//p[#class="someclass"]');
$text = array(); // to hold the strings
if (!is_null($elements)) {
foreach ($elements as $element) {
$text[] = strip_tags($element->nodeValue);
}
}
print_r($text); // print out all the strings
This is tested and working. You can read more about the PHP's DOMDocument class here: http://www.php.net/manual/en/book.dom.php
Here's a demonstration: http://phpfiddle.org/lite/code/0nv-hd6 (click 'Run')
Try this:
preg_match_all('~^(.*?)<br />$~m', $source, $matches);
Should work. Please try it
preg_match_all("/([^<>]*?)<br\s*\/?>/", $source, $matches);
or if your strings may contain some HTML code, use this one:
preg_match_all("/(.*?)<br\s*\/?>\\n/", $source, $matches);
Take a look at this html:
<div class="foo">link1link2</div>
<div class="bar">barlink</div>
I would like to know if I can loop in all links inside foo with a regular expression within php.
I tried this but isn't working:
preg_match_all(
'#<div.*?class="foo".*?<a.*?>(?P<text>.*?)</a>#xi',
$text,
$matches,
PREG_SET_ORDER
);
sadly, in this case, it must be regex, not xml or other parsers.
DON'T USE REGEX TO PARSE HTML.
<?php
$content =
'<div class="foo">
link1
link2
</div>
<div class="bar">
barlink
</div>';
$dom = new DOMDocument();
$dom->loadHTML($content);
$divs = $dom->getElementsByTagName('div');
foreach($divs as $div)
{
$classes = explode(' ', $div->getAttribute('class'));
if(in_array('foo', $classes) || trim($div->getAttribute('class')) === 'foo')
{
foreach($div->getElementsByTagName('a') as $link)
{
echo $dom->saveXML($link);
}
}
}
?>
This will output all matching links under any div with class 'foo'.
Regular Expressions should NOT be used to parse HTML, since HTML itself is not a regular language. It can get very sloppy and you can end up with more problems than what you started with, especially when you could potentially be dealing with malformed HTML.
I have this code that extracts the first image from an article in joomla:
<?php preg_match('/<img (.*?)>/', $this->article->text, $match); ?>
<?php echo $match[0]; ?>
Is there a way to extract all the images that are available in the article and not only one?
I may suggest first to not use Regular Expressions to parse HTML. You should use an appropiate parser such as DOMDocument::loadHTML which uses libxml.
Then you may query for the desired tags you want. Something like this may work (untested):
$doc = new DOMDocument;
$doc->loadHTML($htmlSource);
$xpath = new DOMXPath($doc);
$query = '//img';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
// $entry->getAttribute('src')
}
Use preg_match_all. And you'll want to modify the pattern like so to take into account the trailing '/' inside the img tag.
$str = '<img src="asdf" />stuff more stuff <img src="qwerty" />';
preg_match_all('/<img (.*?)\/>/', $str, $matches);
print_r($matches);
Array
(
[0] => Array
(
[0] => <img src="asdf" />
[1] => <img src="qwerty" />
)
[1] => Array
(
[0] => src="asdf"
[1] => src="qwerty"
)
)