I have to find title but print value how? - php

My code is given below:-
$text = "<div class='title'>Title</div><div class='content'>This is title</div>";
$words = array('Title');
$words = join("|", $words);
$matches = array();
if ( preg_match('/' . $words . '/i', $text, $matches) ){
echo "Words matched: <br/>";
print_r($matches);
}
else{
echo "Not match";
}
The problem is that in above code I am finding title but i don't want to print title; I want to print this: "This is title" and I am not understanding how I can print this by finding title.
Because title is like keyword that will not change but value which i want to print it is dynamic value and it will change every time, that's why i cannot finding value of title. So how can i do it?

Don't use regex for parsing HTML. Use a DOM Parser instead. In this case, you can use an XPath expression to get the element by class name:
$text = "<div class='title'>Title</div>
<div class='content'>This is title</div>";
$dom = new DOMDocument;
$dom->loadHTML($text);
$xpath = new DOMXPath($dom);
$title = $xpath->query('//*[#class="content"]')->item(0)->nodeValue;
Output:
This is title
This should get you started. If the title is in a different position, you can modify the expression accordingly to retrieve it.

Related

Find string in text and explode

I have a text:
$body = 'I lorem ipsum. And I have more text, I want explode this and search link: I link.';
How can I find "I link" and explode href?
My code is:
if (strpos($body, 'site.com/') !== false) {
$itemL = explode('/', $body);
}
But this not working.
RegEx is good choice here. Try:
$body = 'I lorem ipsum. And I have more text, I want explode this
and search link: I link.';
$reg_str='/<a href="(.*?)">(.*?)<\/a>/';
preg_match_all($reg_str, $body, $matches);
echo $matches[1][0]."<br>"; //site.com/text/some
echo $matches[2][0]; //I link
Update
If you have a long text including many hrefs and many link text like I link, you could use for loop to output them, using code like:
for($i=0; $i<count($matches[1]);$i++){
echo $matches[1][$i]; // echo all the hrefs
}
for($i=0; $i<count($matches[2]);$i++){
echo $matches[2][$i]; // echo all the link texts.
}
If you want to replace the old href (e.g. site.com/text/some) with the new one (e.g. site.com/some?id=32324), you could try preg_replace like:
echo preg_replace('/<a(.*)href="(site.com\/text\/some)"(.*)>/','<a$1href="site.com/some?id=32324"$3>',$body);
You could use DOM to operate through your html:
$dom = new DOMDocument;
$dom->loadHTML($body);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//a') as $link) {
var_dump($link->textContent);
var_dump($link->getAttribute('href'));
}
demo

How do I extract links with a specific domain name using PHP and Regex?

I am trying to extract urls that contain www.domain.com from a database column that contains HTML. The regex has to filter out www2.domain.com instances and external urls like www.domainxyz.com. It should only search for properly coded anchor links.
Here is what I have so far:
<?php
$content = '<html>
<title>Random Website</title>
<body>
Click here for foobar
Another site is http://www.domain.com
Test 1
Test 2
<Strong>NOT A LINK</strong>
</body>
</html>';
$regex = "((https?)\:\/\/)?";
$regex .= "([a-z0-9-.]*)\.([a-z]{2,4})";
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?";
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?";
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?";
$regex .= "([www\.domain\.com])";
$matches = array(); //create array
$pattern = "/$regex/";
preg_match_all($pattern, $content, $matches);
print_r(array_values(array_unique($matches[0])));
echo "<br><br>";
echo implode("<br>", array_values(array_unique($matches[0])));
?>
I am looking for this to find and output only http://www.domain.com/test.
How can I modify my Regex to accomplish this?
Here is a much safer way to extract the a href attribute values containing www.domain.com where the key is the XPath '//a[contains(#href, "www.domain.com")]':
$html = "YOUR_HTML_STRING"; // Your HTML string
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = array();
$links = $xpath->query('//a[contains(#href, "www.domain.com")]');
foreach($links as $link) {
array_push($arr, $link->getAttribute("href"));
}
print_r($arr);
See IDEONE demo, result:
Array
(
[0] => http://www.domain.com/test
)
As you see, you can use the DOMDocument and DOMXPath with a string, too.
The code is self-explanatory, the XPath expression just means find all <a> tags that have a href attribute containing www.domain.com.

Matching string without specific pattern between specific places

$example_string = "<a class="190"><br>hello.. 8/10<br><a class="154"><br>9/10<br>"
what i need to match is the classes and the "rating" part (8/10).
Something like this, except i dont know how to write (ANYTHING EXCEPT <br> here) in regexp:
preg_match_all('#class="([0-9]{3})"><br>(ANYTHING EXCEPT <br> here)*?([0-9]/10)#',
$example_string, matches);
So a preg_match_all should give these results:
$matches[1][1] = '190';
$matches[1][2] = '8/10';
$matches[2][1] = '154';
$matches[2][2] = '9/10';
to work off of your pattern, and to answer your question
class="([0-9]{3})"><br>(?:(?!<br>).)*?([0-9]\/10)
Demo
I don't know php, but it should work as it does in python...
get the matches between "classes", and iterate to get your data in the returned matched strings
import re # the regex module
example_string = '"<a class="190"><br>hello.. 8/10<br><a class="154"><br>9/10<br>"'
for match in re.findall(r'(?:class[^\d]")([^\/]+)(?!class)', example_string):
print(list(re.findall(r'(\d+)', match)))
yields the following lists:
['190', '8']
['154', '9']
A simple DOM parser would be able to give you that information:
$example_string = '<a class="190"><br>hello.. 8/10<br><a class="154"><br>9/10<br>';
$dom = new DOMDocument;
$dom->loadHTML($example_string);
$xpath = new DOMXPath($dom);
// get all text nodes that have an anchor parent with a class attribute
$query = '//text()[parent::a[#class]]';
foreach ($xpath->query($query) as $node) {
echo $node->textContent, "\n";
echo "parent node: ", $node->parentNode->getAttribute('class'), "\n";
}
Output
hello.. 8/10
parent node: 190
9/10
parent node: 154
(?<=class=")(\d+)|(\d+\/\d+)
Try this.See demo.
https://regex101.com/r/yR3mM3/58
$re = "/(?<=class=\")(\\d+)|(\\d+\\/\\d+)/";
$str = "<a class=\"190\"><br>hello.. 8/10<br><a class=\"154\"><br>9/10<br>";
preg_match_all($re, $str, $matches);

preg_replace - How to remove contents inside a tag?

Say I have this.
$string = "<div class=\"name\">anyting</div>1234<div class=\"name\">anyting</div>abcd";
$regex = "#([<]div)(.*)([<]/div[>])#";
echo preg_replace($regex,'',$string);
The output is
abcd
But I want
1234abcd
How do I do it?
Like this:
preg_replace('/(<div[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
If you want to remove the divs too:
preg_replace('/<div[^>]*>.*?<\/div>/i', '', $string);
To replace only the content in the divs with class name and not other classes:
preg_replace('/(<div.*?class="name"[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
$string = "<div class=\"name\">anything</div>1234<div class=\"name\">anything</div>abcd";
echo preg_replace('%<div.*?</div>%i', '', $string); // echo's 1234abcd
Live example:
http://codepad.org/1XEC33sc
add ?, it will find FIRST occurence
preg_replace('~<div .*?>(.*?)</div>~','', $string);
http://sandbox.phpcode.eu/g/c201b/3
This might be a simple example, but if you have a more complex one, use an HTML/XML parser. For example with DOMDocument:
$doc = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($doc);
$query = "//body/text()";
$nodes = $xpath->query($query);
$text = "";
foreach($nodes as $node) {
$text .= $node->wholeText;
}
Which query you have to use or whether you have to process the DOM tree in some other way, depends on the particular content you have.

Remove all text within specific tags

I am interesting in removing all the text within the following tags:
<p class="wp-caption-text">Remove this text</p>
Can anybody give me an idea of how this can be done in php?
Thank you very much
Get rid of the tag and content inside of it:
$content = preg_replace('/<p\sclass=\"wp\-caption\-text\">[^<]+<\/p>/i', '', $content);
or if you want to preserve the tags:
$content = preg_replace('/(<p\sclass=\"wp\-caption\-text\">)[^<]+(<\/p>)/i', '$1$2', $content);
As bit higher-level alternative to regular expressions.
You can process with DOM. You can match all nodes you're looking for with XPath //p[#class="wp-caption-text"].
For example:
$doc = new DOMDocument();
$doc->loadHTML($yourHTMLasString);
$xpath = new DOMXPath($doc);
$query = '//p[#class="wp-caption-text"]';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$entry->textContent = '';
}
echo $doc->saveHTML();
Try this:
$string = '<p class="wp-caption-text">Remove this text</p>';
$pattern = '/(.*<p .*>).*(<\/p>.*)/';
$replacement = '$1$2';
echo preg_replace($pattern, $replacement, $string);
if its always the same tag you could simply do search for the string. use the position resulting to substring from it to the closing tag.
Or you could use a regular expression, there are good ones posted here that can help you.

Categories