My code is given below:-
$text = "<div class='title'>Title</div><div class='content'>This is title</div>";
$words = array('Title');
$words = join("|", $words);
$matches = array();
if ( preg_match('/' . $words . '/i', $text, $matches) ){
echo "Words matched: <br/>";
print_r($matches);
}
else{
echo "Not match";
}
The problem is that in above code I am finding title but i don't want to print title; I want to print this: "This is title" and I am not understanding how I can print this by finding title.
Because title is like keyword that will not change but value which i want to print it is dynamic value and it will change every time, that's why i cannot finding value of title. So how can i do it?
Don't use regex for parsing HTML. Use a DOM Parser instead. In this case, you can use an XPath expression to get the element by class name:
$text = "<div class='title'>Title</div>
<div class='content'>This is title</div>";
$dom = new DOMDocument;
$dom->loadHTML($text);
$xpath = new DOMXPath($dom);
$title = $xpath->query('//*[#class="content"]')->item(0)->nodeValue;
Output:
This is title
This should get you started. If the title is in a different position, you can modify the expression accordingly to retrieve it.
Related
I have a text:
$body = 'I lorem ipsum. And I have more text, I want explode this and search link: I link.';
How can I find "I link" and explode href?
My code is:
if (strpos($body, 'site.com/') !== false) {
$itemL = explode('/', $body);
}
But this not working.
RegEx is good choice here. Try:
$body = 'I lorem ipsum. And I have more text, I want explode this
and search link: I link.';
$reg_str='/<a href="(.*?)">(.*?)<\/a>/';
preg_match_all($reg_str, $body, $matches);
echo $matches[1][0]."<br>"; //site.com/text/some
echo $matches[2][0]; //I link
Update
If you have a long text including many hrefs and many link text like I link, you could use for loop to output them, using code like:
for($i=0; $i<count($matches[1]);$i++){
echo $matches[1][$i]; // echo all the hrefs
}
for($i=0; $i<count($matches[2]);$i++){
echo $matches[2][$i]; // echo all the link texts.
}
If you want to replace the old href (e.g. site.com/text/some) with the new one (e.g. site.com/some?id=32324), you could try preg_replace like:
echo preg_replace('/<a(.*)href="(site.com\/text\/some)"(.*)>/','<a$1href="site.com/some?id=32324"$3>',$body);
You could use DOM to operate through your html:
$dom = new DOMDocument;
$dom->loadHTML($body);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $link) {
var_dump($link->textContent);
var_dump($link->getAttribute('href'));
}
demo
I am trying to extract urls that contain www.domain.com from a database column that contains HTML. The regex has to filter out www2.domain.com instances and external urls like www.domainxyz.com. It should only search for properly coded anchor links.
Here is what I have so far:
<?php
$content = '<html>
<title>Random Website</title>
<body>
Click here for foobar
Another site is http://www.domain.com
Test 1
Test 2
<Strong>NOT A LINK</strong>
</body>
</html>';
$regex = "((https?)\:\/\/)?";
$regex .= "([a-z0-9-.]*)\.([a-z]{2,4})";
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?";
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?";
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?";
$regex .= "([www\.domain\.com])";
$matches = array(); //create array
$pattern = "/$regex/";
preg_match_all($pattern, $content, $matches);
print_r(array_values(array_unique($matches[0])));
echo "<br><br>";
echo implode("<br>", array_values(array_unique($matches[0])));
?>
I am looking for this to find and output only http://www.domain.com/test.
How can I modify my Regex to accomplish this?
Here is a much safer way to extract the a href attribute values containing www.domain.com where the key is the XPath '//a[contains(#href, "www.domain.com")]':
$html = "YOUR_HTML_STRING"; // Your HTML string
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = array();
$links = $xpath->query('//a[contains(#href, "www.domain.com")]');
foreach($links as $link) {
array_push($arr, $link->getAttribute("href"));
}
print_r($arr);
See IDEONE demo, result:
Array
(
[0] => http://www.domain.com/test
)
As you see, you can use the DOMDocument and DOMXPath with a string, too.
The code is self-explanatory, the XPath expression just means find all <a> tags that have a href attribute containing www.domain.com.
$example_string = "<a class="190"><br>hello.. 8/10<br><a class="154"><br>9/10<br>"
what i need to match is the classes and the "rating" part (8/10).
Something like this, except i dont know how to write (ANYTHING EXCEPT <br> here) in regexp:
preg_match_all('#class="([0-9]{3})"><br>(ANYTHING EXCEPT <br> here)*?([0-9]/10)#',
$example_string, matches);
So a preg_match_all should give these results:
$matches[1][1] = '190';
$matches[1][2] = '8/10';
$matches[2][1] = '154';
$matches[2][2] = '9/10';
to work off of your pattern, and to answer your question
class="([0-9]{3})"><br>(?:(?!<br>).)*?([0-9]\/10)
Demo
I don't know php, but it should work as it does in python...
get the matches between "classes", and iterate to get your data in the returned matched strings
import re # the regex module
example_string = '"<a class="190"><br>hello.. 8/10<br><a class="154"><br>9/10<br>"'
for match in re.findall(r'(?:class[^\d]")([^\/]+)(?!class)', example_string):
print(list(re.findall(r'(\d+)', match)))
yields the following lists:
['190', '8']
['154', '9']
A simple DOM parser would be able to give you that information:
$example_string = '<a class="190"><br>hello.. 8/10<br><a class="154"><br>9/10<br>';
$dom = new DOMDocument;
$dom->loadHTML($example_string);
$xpath = new DOMXPath($dom);
// get all text nodes that have an anchor parent with a class attribute
$query = '//text()[parent::a[#class]]';
foreach ($xpath->query($query) as $node) {
echo $node->textContent, "\n";
echo "parent node: ", $node->parentNode->getAttribute('class'), "\n";
}
Output
hello.. 8/10
parent node: 190
9/10
parent node: 154
(?<=class=")(\d+)|(\d+\/\d+)
Try this.See demo.
https://regex101.com/r/yR3mM3/58
$re = "/(?<=class=\")(\\d+)|(\\d+\\/\\d+)/";
$str = "<a class=\"190\"><br>hello.. 8/10<br><a class=\"154\"><br>9/10<br>";
preg_match_all($re, $str, $matches);
Say I have this.
$string = "<div class=\"name\">anyting</div>1234<div class=\"name\">anyting</div>abcd";
$regex = "#([<]div)(.*)([<]/div[>])#";
echo preg_replace($regex,'',$string);
The output is
abcd
But I want
1234abcd
How do I do it?
Like this:
preg_replace('/(<div[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
If you want to remove the divs too:
preg_replace('/<div[^>]*>.*?<\/div>/i', '', $string);
To replace only the content in the divs with class name and not other classes:
preg_replace('/(<div.*?class="name"[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
$string = "<div class=\"name\">anything</div>1234<div class=\"name\">anything</div>abcd";
echo preg_replace('%<div.*?</div>%i', '', $string); // echo's 1234abcd
Live example:
http://codepad.org/1XEC33sc
add ?, it will find FIRST occurence
preg_replace('~<div .*?>(.*?)</div>~','', $string);
http://sandbox.phpcode.eu/g/c201b/3
This might be a simple example, but if you have a more complex one, use an HTML/XML parser. For example with DOMDocument:
$doc = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($doc);
$query = "//body/text()";
$nodes = $xpath->query($query);
$text = "";
foreach($nodes as $node) {
$text .= $node->wholeText;
}
Which query you have to use or whether you have to process the DOM tree in some other way, depends on the particular content you have.
I am interesting in removing all the text within the following tags:
<p class="wp-caption-text">Remove this text</p>
Can anybody give me an idea of how this can be done in php?
Thank you very much
Get rid of the tag and content inside of it:
$content = preg_replace('/<p\sclass=\"wp\-caption\-text\">[^<]+<\/p>/i', '', $content);
or if you want to preserve the tags:
$content = preg_replace('/(<p\sclass=\"wp\-caption\-text\">)[^<]+(<\/p>)/i', '$1$2', $content);
As bit higher-level alternative to regular expressions.
You can process with DOM. You can match all nodes you're looking for with XPath //p[#class="wp-caption-text"].
For example:
$doc = new DOMDocument();
$doc->loadHTML($yourHTMLasString);
$xpath = new DOMXPath($doc);
$query = '//p[#class="wp-caption-text"]';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$entry->textContent = '';
}
echo $doc->saveHTML();
Try this:
$string = '<p class="wp-caption-text">Remove this text</p>';
$pattern = '/(.*<p .*>).*(<\/p>.*)/';
$replacement = '$1$2';
echo preg_replace($pattern, $replacement, $string);
if its always the same tag you could simply do search for the string. use the position resulting to substring from it to the closing tag.
Or you could use a regular expression, there are good ones posted here that can help you.