If I had a string of text e.g:
<p>Hello world here is the latest news ##news?123## or click here to read more</p>
I want to look through the string and find anything starting with ##news? Then I want to save that and the id 123 and trailing hashes as a variable, so the final output would be:
$myvar == "##news?123##"
How can I use PHP to read that input string and save that specific part as a variable?
preg_match is your friend for this sort of problem.
$str='<p>Hello world here is the latest news ##news?123## or click here to read more';
$pttn='#(##news\?(\d+)##)#';
preg_match( $pttn, $str, $matches );
$myvar=$matches[0];
$id=$matches[2];
echo $myvar.' '.$id;
Assuming you're not trying to parse HTML but as you say, just find any occurrence starting with "##news?" then you can use regex happily here:
$string = '<p>Hello world here is the latest news ##news?123## or click here to read more';
$matches = array();
preg_match_all("/(##news\?.*?)\s/",$string,$matches);
print_r($matches);
Related
[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.
I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger
To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}
I'm trying to use a regex to find and replace all URLs in a forum system. This works but it also selects anything that is within bbcode. This shouldn't be happening.
My code is as follows:
<?php
function make_links_clickable($text){
return preg_replace('!(([^=](f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
//$text = "https://www.mcgamerzone.com<br>http://www.mcgamerzone.com/help/support<br>Just text<br>http://www.google.com/<br><b>More text</b>";
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Unparsed text:</b><br>";
echo $text;
echo "<br><br>";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
?>
All urls that occur in bb-code are following up on a = character, meaning that I don't want anything that starts with = to be selected.
I basically have that working but this results in selecting 1 extra character in in front of the string that should be selected.
I'm not very familiar with regex. The final output of my code is this:
<b>Unparsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa<br>
<br>
<b>Parsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa
You can match and skip [url=...] like this:
\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)
See regex demo
That way, you will only match the URLs outside the [url=...] tag.
IDEONE demo:
function make_links_clickable($text){
return preg_replace('~\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)~iu', '$1', $text);
}
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
You can use a negative lookbehind (?<!=) instead of your negated class. It asserts that what is going to be matched isn't preceded by something.
Example
I am trying to extract a string from another string using php.
At the moment im using:
<?php
$testVal = $node->field_link[0]['view'];
$testVal = preg_replace("#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie", "'$3$4'", $testVal);
print "testVal = ";
print $testVal;
?>
This seems to be printing my entire string at the moment.
Now what i want to do is: extract a web address if there is one and save it as a variable called testVal.
I am a novice so please explain what i am doing wrong. Also i have looked at other questions and have used the regex from one.
For #bos
Input:
<iframe width="560" height="315" src="http://www.youtube.com/embed/CLXt3yh2g0s" frameborder="0" allowfullscreen></iframe>
Desired Output
http://www.youtube.com/embed/CLXt3yh2g0s
Well, you say you want to populate $testVal with the extracted web address, but you're using preg_replace instead of preg_match. You use preg_replace when you wish to replace occurrences, and you use preg_match (or preg_match_all) when you want to find occurrences.
If you want to replace URLs with links (<a> tags) like in your example, use something like this:
<?php
$testVal = preg_replace(
'/((?:https?:\/\/|ftp:\/\/|irc:\/\/)[^\s<>()"]+?(?:\([^\s<>()"]*?\)[^\s<>()"]*?)*)((?:\s|<|>|"|\.||\]|!|\?|,|,|")*(?:[\s<>()"]|$))/',
'<a target="_blank" rel="nofollow" href="$1">$1</a>$2',
$testVal
);
If you want to instead simply locate a URL from a string, try (using your regex now instead of mine above):
<?php
$testVal = $node->field_link[0]['view'];
if(!preg_match("#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie", $testVal, $matches)) {
echo "Not found!";
else {
echo "URL: " . $matches[1];
}
When you use preg_match, the (optional) third parameter is filled with the results of the search. $matches[0] would contain the string that matched the entire pattern, $matches[1] would contain the first capture group, $matches[2] the second, and so on.
Let's say I have a page I want to scrape for words with "ice" in them, how can I do this easily? I see a lot of scrapers breaking things down into source code, but I don't need this. I just need something that searches through the plain text on the webpage.
Edit: I basically need something to search for .jpeg and find the entire file name. (it is in plain text on the website, not hidden in a tag)
Anything that matches the following is a word with ice in it:
/(\w*)ice(\w*)/i
(Do note that \w matches 0-9 and _ too. The following might give better results: /\b.*?ice\b.*?/i)
UPDATE
To match file names (must not contain whitespace):
/\S+\.jpeg/i
Example:
<?php
$str = 'Picture of me: 238484534.jpeg and someone else img-of-someone.jpeg here';
$cnt = preg_match_all('/\S+\.jpeg/i', $str, $matches);
print_r($matches);
1.do u want to read the word inside the HTML tags too like attribute,textname ?
2.Or only the visible part of the webpage ?
for#1 : solutions are simple and already there as mentioned in other answers.
for#2:
Use PHP DOMDOCUMENT class, and extract and search in innerHTML only.
documentation here :
http://php.net/manual/en/class.domdocument.php
see this for example:
PHP DOMDocument stripping HTML tags
Some regex use will be needed for this. Below I use PCRE http://www.php.net/manual/en/ref.pcre.php and the function preg_match http://www.php.net/manual/en/function.preg-match-all.php
<?php
$html = <<<EOF
<html>
<head>
<title>Test</title>
</head>
<body>List of files:
<ul>
<li>test1.jpeg</li>
<li>test2.jpeg</li>
</ul>
</body>
</html>
EOF;
$matches = array();
$count = preg_match_all("([0-9a-zA-Z_-]+\.jpeg)", $html, $matches);
if (count($matches) > 1) {
for ($i = 1; $i < count($matches); $i++) {
print "Filename: {$matches[$i]}\n";
}
}
?>
try this:
preg_match_all('/\w*ice\w*/', 'abc icecream lice', $matches);
print_r($matches);
example i have a $text
<p>this is a link a link hello this is a text then a <img src="www.link.com/image.png">. then i have something like %,$,<?php etc ?> but i dont wanna lose my numbers and ? sign how can we do that?</p>;
and somehow
i need it just like
this is a link a link hello this is a www.link.com/image.png.
text then a. then i have something
like etc but i dont wanna lose my
numbers and ? sign how can we do that?
( first 100 words )
summery
somehow delete the tags, image ( somehow change it with the src or alt instead )
delete $, # etc, except the '?' sign
get the first 100 words
get the numbers too
for the 100 words i think get a string before the first "." with php have a great answer to it. im not quite sure useing regex or ? for this one.
Thanks for looking in.
Adam Ramadhan
<?php
$text = '<p>this is a link a link hello this is a text then a <img src="www.link.com/image.png">. then i have something like %,$,etc but i dont wanna lose my numbers and ? sign how can we do that?</p>';
$text = preg_replace('/<a.*? href="(.*?)".*?>.*?<\/a>/i', 'http://link.com$1.', $text);
$text = strip_tags($text);
$text = str_replace(array('$,','#,', '%,'), array('', ''), $text);
preg_match('/^((?:\w*?\W){0,100})/', $text, $m);
$text = $m[1];
echo $text;
Output:
this is a link http://link.com/link. hello this is a text then a . then i have something like etc but i dont wanna lose my numbers and ? sign how can we do that?