I'm using the code below for highlight one word from file_get_content and go to anchor.
$file='
IAR6=1002
SHF6=1
REF6=0002
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
TY8=2
DATE8=20130820182429
STAT_N8=1002
SH8=1
OP8=S123
SEQ8=0002120000081
';
$Seq = 0002110000001;
$text = preg_replace("/\b($Seq)\b/i", '<span class="highlight"><a name="here">\1</a></span>', $file);
for now this highlight : 0002110000001
i would like to highlight all part of the same index number.
ex:
looking for 0002110000001
highlight this part of txt only where number is 7
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
Any help will be appreciated.
EDIT:
i try to be more specific.
file contain lot of code parts always start by TYx (x is auto numbering)
i have the SEQ number for my search , in ex 0002110000001
the preg_replace("/\b($Seq)\b/i", '\1 find 0002110000001 and higlight them.
what i need is higlight what is between TY7 and TY8 instead of only 0002110000001.
Hope this is clear enough due to my bad english
thanks
You can make use of stripos() and explode() in PHP
<?php
$file='
IAR6=1002
SHF6=1
REF6=0002
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
TY8=2
DATE8=20130820182429
STAT_N8=1002
SH8=1
OP8=S123
SEQ8=0002120000081
';
//$Seq = "0002110000001";
$Seq = "7";
$new_arr=explode(PHP_EOL,$file);
foreach($new_arr as $k=>$v)
{
if(stripos($v,$Seq)!==false)
{
echo "$v\n";
}
}
OUTPUT :
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
Related
[PHP]I have a variable for storing strings (a BIIGGG page source code as string), I want to echo only interesting strings (that I need to extract to use in a project, dozens of them), and they are inside the quotation marks of the tag
but I just want to capture the values that start with the letter: N (news)
[<a href="/news7044449/exclusive_news_sunday_"]
<a href="/n[ews7044449/exclusive_news_sunday_]"
that is, I think you will have to work with match using: [a href="/n]
how to do that to define that the echo will delete all the texts of the variable, showing only:
note that there are other hrefs tags with values that start with other letters, such as the letter 'P' : href="/profiles... (This does not interest me.)
$string = '</div><span class="news-hd-mark">HD</span></div><p>exclusive_news_sunday_</p><p class="metadata"><span class="bg">Czech AV<span class="mobile-hide"> - 5.4M Views</span>
- <span class="duration">7 min</span></span></p></div><script>xv.thumbs.preparenews(7044449);</script>
<div id="news_31720715" class="thumb-block "><div class="thumb-inside"><div class="thumb"><a href="/news31720715/my_sister_running_every_single_morning"><img src="https://static-hw.xnewss.com/img/lightbox/lightbox-blank.gif"';
I imagine something like this:
$removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n = ('/something regex expresion I think /' or preg_match, substring?);
echo $string = str_replace($removes_everything_except_values_from_the_href_tag_starting_with_the_letter_n,'',$string);
expected output: /news7044449/exclusive_news_sunday_
NOTE: it is not essential to be through a variable, it can be from a .txt file the place where the extracts will be extracted, and not necessarily a variable.
thanks.
I believe this will help her.
<?php
$source = file_get_contents("code.html");
preg_match_all("/<a href=\"(\/n(?:.+?))\"[^>]*>/", $source, $results);
var_export( end($results) );
Step by Step Regex:
Regex Demo
Regex Debugger
To get just the links out of the $results array from Valdeir's answer:
foreach ($results as $r) {
echo $r;
// alt: to display them with an HTML break tag after each one
echo $r."<br>\n";
}
So I'm trying to make a php function to get HTML tags from a BBCode-style form. The fact is, I was able to get tags pretty easily with preg_replace. But I have some troubles when I have a bbcode inside the same bbcode...
Like this :
[blue]My [black]house is [blue]very[/blue] beautiful[/black] today[/blue]
So, when I "parse" it, I always have remains bbcode for the blue ones. Something like :
My house is [blue]very[/blue] beautiful today
Everything is colored except for the blue-tag inside the black-tag inside the first blue-tag.
How the hell can I do that ?
With more informations, I tried :
Regex: "/\[blue\](.*)\[\/blue\]/si" or "/\[blue\](.*)\[\/blue\]/i"
Getting : "My house is [blue]very[/blue] beautiful today"
Regex : "/\[blue\](.*?)\[\/blue\]/si" or "/\[blue\](.*)\[\/blue\]/Ui"
Getting : "My house is [blue]very beautiful today[/blue]"
Do I have to loop the preg_replace ? Isn't there a way to do it, regex-style, without looping the thing ?
Thx for your concern. :)
It is right that you should not reinvent the wheel on products and rather choose well-tested plugins. However, if you are experimenting or working on pet projects, by all means, go ahead and experiment with things, have fun and obtain important knowledge in the process.
With that said, you may try following regex. I'll break it down for you on below.
(\[(.*?)\])(.*?)(\[/\2\])
Philosophy
While parsing markup like this, what you are actually seeking is to match tags with their pairs.
So, a clean approach you can take would be running a loop and capturing the most outer tag pair each time and replacing it.
So, on the given regex above, capture groups will give you following info;
Opening tag (complete) [black]
Opening tag (tag name) black
Content between opening and closing tag My [black]house is [blue]very[/blue] beautiful[/black] today
Closing tag [/blue]
So, you can use $2 to determine the tag you are processing, and replace it with
<tag>$3</tag>
// or even
<$2>$3</$2>
Which will give you;
// in first iteration
<tag>My [black]house is [blue]very[/blue] beautiful[/black] today</tag>
// in second iteration
<tag>My <tag2>house is [blue]very[/blue] beautiful</tag2> today</tag>
// in third iteration
<tag>My <tag2>house is <tag3>very</tag3> beautiful</tag2> today</tag>
Code
$text = "[blue]My [black]house is [blue]very[/blue] beautiful[/black] today[/blue]";
function convert($input)
{
$control = $input;
while (true) {
$input = preg_replace('~(\[(.*?)\])(.*)(\[/\2\])~s', '<$2>$3</$2>', $input);
if ($control == $input) {
break;
}
$control = $input;
}
return $input;
}
echo convert($text);
As others mentionned, don't try to reinvent the wheel.
However, you could use a recursive approach:
<?php
$text = "[blue]My [black]house is [blue]very[/blue] beautiful[/black] today[/blue]";
$regex = '~(\[ ( (?>[^\[\]]+) | (?R) )* \])~x';
$replacements = array( "blue" => "<bleu>",
"black" => "<noir>",
"/blue" => "</bleu>",
"/black" => "</noir>");
$text = preg_replace_callback($regex,
function($match) use ($replacements) {
return $replacements[$match[2]];
},
$text);
echo $text;
# <bleu>My <noir>house is <bleu>very</bleu> beautiful</noir> today</bleu>
?>
Here, every colour tag is replaced by its French (just made it up) counterpart, see a demo on ideone.com. To learn more about recursive patterns, have a look at the PHP documentation on the subject.
Let's say I have a page I want to scrape for words with "ice" in them, how can I do this easily? I see a lot of scrapers breaking things down into source code, but I don't need this. I just need something that searches through the plain text on the webpage.
Edit: I basically need something to search for .jpeg and find the entire file name. (it is in plain text on the website, not hidden in a tag)
Anything that matches the following is a word with ice in it:
/(\w*)ice(\w*)/i
(Do note that \w matches 0-9 and _ too. The following might give better results: /\b.*?ice\b.*?/i)
UPDATE
To match file names (must not contain whitespace):
/\S+\.jpeg/i
Example:
<?php
$str = 'Picture of me: 238484534.jpeg and someone else img-of-someone.jpeg here';
$cnt = preg_match_all('/\S+\.jpeg/i', $str, $matches);
print_r($matches);
1.do u want to read the word inside the HTML tags too like attribute,textname ?
2.Or only the visible part of the webpage ?
for#1 : solutions are simple and already there as mentioned in other answers.
for#2:
Use PHP DOMDOCUMENT class, and extract and search in innerHTML only.
documentation here :
http://php.net/manual/en/class.domdocument.php
see this for example:
PHP DOMDocument stripping HTML tags
Some regex use will be needed for this. Below I use PCRE http://www.php.net/manual/en/ref.pcre.php and the function preg_match http://www.php.net/manual/en/function.preg-match-all.php
<?php
$html = <<<EOF
<html>
<head>
<title>Test</title>
</head>
<body>List of files:
<ul>
<li>test1.jpeg</li>
<li>test2.jpeg</li>
</ul>
</body>
</html>
EOF;
$matches = array();
$count = preg_match_all("([0-9a-zA-Z_-]+\.jpeg)", $html, $matches);
if (count($matches) > 1) {
for ($i = 1; $i < count($matches); $i++) {
print "Filename: {$matches[$i]}\n";
}
}
?>
try this:
preg_match_all('/\w*ice\w*/', 'abc icecream lice', $matches);
print_r($matches);
I'am doing a simple text editor that I need to handle creating paragraphs.
Paragraphs will be in WikiDot Syntax, long story short what i need to change:
+ paragraph 1
changes to
< h1>paragraph< /h1>
++ subparagraph 1
changes to
< h2>subparagraph< /h2>
How do this in PHP?
To expand on #CrayonViolent's (in cases where the first replace interrups the second):
<?php
$content = "Hello, world
+ Big Heading
++ Smaller heading
Additional content";
function r($m){
$tag = "h".strlen($m[1]);
return "<{$tag}>{$m[2]}</{$tag}>";
}
$content = preg_replace_callback('/^(\+{1,6})\s?(.*)$/m','r', $content);
echo $content;
?>
Also added the m (multi-line) flag to the regex for a little better matching, and will only do headers <h1>~<h6>.
Working example can be located here
$content = preg_replace ("~^\+\+(.*?)\n\n~",'<h2>$1</h2>',$content);
$content = preg_replace ("~^\+(.*?)\n\n~",'<h1>$1</h1>',$content);
I am experimenting with finding similar text between a string and an online article. I am playing with similar_text() in php that shows the percentage a string matches. But I am trying to figure out how to echo out what similar_text() is finding that is similar. Is there any way to do this?
Here is a sample of what I am trying to do:
$similarText = similar_text($articleContent, $wordArr[$wordNum][1], $p);
//if(strpos($articleContent, $wordArr[$wordNum][1] ) !== false)
if($p > .25)
{
$test =($wordArr[$wordNum][1] - similar_text($articleContent, $wordArr[$wordNum][1]));
echo $test."<br/>";
echo "Percent: $p%"."<br/>";
echo "MATCH NAME<br/>";
print_r($wordArr[$wordNum]);
echo "<br/><br/>";
}
The similar text gives me a percentage of the words that I am matching, but I kind of want to see how it is working, and actually show the word it matches to the word it is matching. Like echo out:
echo $matcher." matches ".$matchee
Consider make a example for get a better answer.
<?
similar_text($string1, $string2, $p);
echo "Percent: $p%";
?>
If you need see how much characters have been changed.
<?=(strlen($string2) - similar_text($string,$string2));?>