If I had the following HTML:
<li> Thisislink1</li>
<li> Thisisanotherlink</li>
<li> Onemorelink</li>
Where each link will be different in length and value.
How can I search for the values inside the link (IE: Thisislink1, Thisisanotherlink and Onemorelink) with a search phrase, say 'another'. So in this example, only 'Thisisanotherlink' would be returned, but if I changed the search phrase to 'link', then all 3 values will be returned.
Don't use regex. Use DOMDocument.
/\w*another\w*/
This needs to be done in two passes:
Extract the text from all links in the document. XSL or XPath should we workable for this purpose. As you extract text, keep a copy of the DOM around so you can attach information to it and the text, telling you where the text is extracted from (if you are going to need this info later, you might not). As an alternative, just keep attach the contents of the href attribute to the text.
Be sure to extract all the text you need (e.g. title attributes, or alt text of <a href><img alt></a> type constructs.
Search the extracted text for the phrase you are looking for.
(Optional) use the information you set earlier to map back to the DOM to figure out what element you gathered the text from, and highlight it. If you extracted the href attribute, you could just make a new link using this and the matching text.
Related
Scenario:
I need to apply a php function to the plain text contained inside HTML tags, and show the result, maintaining the original tags (with their original attributes).
Visualize:
Take this:
<p>Some text here pointing to the moon and that's it</p>
Return this:
<p>
phpFunction('Some text here pointing to the ')
phpFunction('moon')
phpFunction(' and that\'s it')
</p>
What I should do:
Use a PHP html parser (instead of using regexp) and iterate over every tag, applying the callback to the node text content.
Problem:
If I have, for example, an <a> tag inside a <p> tag, the text content of the parent <p> tag would consist of two different plain text parts, which the php callback should considerate as separate.
Question:
How should I approach this in a clean and smooth way?
Thanks for your time, all the best.
In the end, I decided to use regex instead of including an external library.
For the sake of simplicity:
$expectedOutput = preg_replace_callback(
'/>(.*)</U',
function ($withstuff) {
return '>'.doStuff($withStuff).' <';
},
$fromInput
);
This will look for everything between > and <, which is, indeed, what I was looking for.
Of course any suggestion/comment is still welcome.
Peace.
I'm working on an E-Book that will be published to my website. I want to mimic OSX spotlight feature where someone can use a my fixed search bar and input text that is then highlighted on the page for them. I was trying to use Sphider but no such luck on getting this result.
•found this similar thread but not exactly what I'm looking for.
You could use a string replace to surround all text that needs to be highlighted with a span tag. Then create a CSS class for that span tag.
<?php
$searchString = $_POST['search'];
$EBOOK = str_replace($searchString, "<span class='highlighted'>$searchString</span>", $EBOOK);
Then some CSS
.highlighted {
background-color:yellow;
}
To take it to the next step you could use javascript to scroll the user's web browser to the first location of a span.highlighted.
Note I wouldn't use a regular expression to replace search string value (ie preg_replace) because the user's search input could contain special characters used by regex that may need to be escaped.
This is all theoretical of course... based on your question.
Edit: just thought of something, Ebook content will contain HTML tags so if you were to use a string replace function like I suggested. Take into consideration to not allow the tags to be searched and replaced. A regular expression replace may be needed in this case
I have a situation here, i am a bit of java guy and getting some hard time with php.
I am creating an XML file from a database. For now, i created more than 90 dynamic elements, some includes attributes, child etc, w/o any problem.
But things got messed up here;
text1:
here is a list of pencils[1]. here is a list of another type of pencils[2].
I do want to have
<text1>
here is a list of pencils <id>1</id>. here is a list of another type of pencils <id>2</id>.
</text1>
i can replace substrings ([1], [2]) and insert some other text, but how to replace these substrings with DOM element?
any help is deeply appreciated..
You cannot because the string within you want to do the replacement is the node Value of the text1 node. A variant would be to structure it like:
<text1>
<partial>here is a list of pencils</partial>
<id>1</id>
<partial>.here is a list of another type of pencils</partial>
<id>2</id>
</text1>
But honestly that is suboptimal.
I assume what got you confused (and me for a second there) is the way we write HTML:
<p>some text here a link more <strong>variation</strong></p>
Which might give us the impression that it should be valid XML as well; but of course there is another thing to know; that browsers actually transform the prior HTML to the following form (~):
<p>
<textnode>some text here </textnode>
<a href="...">
<textnode>a link</a>
</a>
<textnode> more </textnode>
<strong>variation</strong>
</p>
Not the answer, but I'd recommend you rethink your XML format.
Here is my code: http://jsfiddle.net/zEXrq/8/
Also I need to check <li> tag and <h3> tags.
Is it possible to check other words into those tags like <a onclick="word_desc(23); classChange(1, 1);" id="txtid1" class="sel25">user key</a> - here I want to replace only "user key", but when I put "a" to replace all "a" words including <a> tag and other words inside tags are replaced. How to solve this?
Your problem is that to do this with regexes you'll need a ridiculously complex regex to ignore occurrences that appear within tags. Instead, you can convert the string to an HTML DOM tree and only perform the highlighting on text nodes.
Now, you can't just replace a text node's nodeValue with HTML, it won't work. You'll need to modify the nodeValue, and insert new nodes where required. This may sound confusing, so I've jsFiddled it for you:
http://jsfiddle.net/zEXrq/29/
i hope you need something like this. Because i couldnt understand your problem fully. So check this and addd comment then i can fix it.
http://jsfiddle.net/zEXrq/28/
If someone posts a multi-line post that contained text and links, I want to be able to find and wrap the links with <p> tags, but I can only do it with one link at a time (source code comes from phpBB2 - clickable links function), which causes every link to be like this:
<p>http://www.bbc.co.uk/</p>
<p>http://www.bbc.co.uk/</p>
<p>http://www.bbc.co.uk/</p>
Where I want it to happen to be like this:
<p>http://www.bbc.co.uk/
http://www.bbc.co.uk/
http://www.bbc.co.uk/</p>
Cheers.
Feed it to DOM loadHTML function and getElementsByTagName('p'), make a reference with ->item(i) based on the ->length, get the nodeValue and just make a new paragraph with document.createElement, set the nodeValue to your string of nodeValues that you retrieved from the loop after concatenating them with \n<br> or something.
You shouldn't use regex for this.