If someone posts a multi-line post that contained text and links, I want to be able to find and wrap the links with <p> tags, but I can only do it with one link at a time (source code comes from phpBB2 - clickable links function), which causes every link to be like this:
<p>http://www.bbc.co.uk/</p>
<p>http://www.bbc.co.uk/</p>
<p>http://www.bbc.co.uk/</p>
Where I want it to happen to be like this:
<p>http://www.bbc.co.uk/
http://www.bbc.co.uk/
http://www.bbc.co.uk/</p>
Cheers.
Feed it to DOM loadHTML function and getElementsByTagName('p'), make a reference with ->item(i) based on the ->length, get the nodeValue and just make a new paragraph with document.createElement, set the nodeValue to your string of nodeValues that you retrieved from the loop after concatenating them with \n<br> or something.
You shouldn't use regex for this.
Related
Preface: I cannot rename the source tags or edit their IDs. Any changes to the tags must happen after they have been fetched.
What I'm doing: using file_get_contents in PHP, I am requesting data from a remote site. This data is just two <p> tags. I need to hide or rename the second of the two <p> tags.
Is this possible with PHP or jQuery?
What I'm working with:
<p>Hello my name is test</p><p>I like studying geology.</p>
If you need to hide second text, you can do this with Jquery:
$('p:eq(1)').hide();
Jsfiddle
You could try a php string replace
$new_string = str_replace('</p><p>','',file_get_contents('somecontent'));
If you need to do it before render HTML, you need to parse contents and remove/replace second p tag and create a new content.
Here is a DOM parser Simple HTML DOM Parser
Find similar questions below
How to match second <a> tag in this string
How to add attribute to first P tag using PHP regular expression?
Or you can do it after rendering HTML as rNix suggested.
Problem:
I need to confirm that iframe have one type of link with the following format:
http://www.example.com/embed/*****11 CHARACTERS MAX.****?rel=0
Starts with: http://www.example.com/embed/
Ends with: ?rel=0
11 CHARACTERS MAX. means in this spot, there can any 11 characters. Don't go beyond 11.
NOTE: none of the specified tags are ensured to be in every post. It depends on how user uses the editor.
I'm using PHP
I used the line below to make sure all tags are excluded except the ones specified:
$rtxt_offer = preg_replace('#<(?!/?(u|br|iframe)\b)[^>]+>#', '', $rtxt_offer);
You wrote you only want to validate the link value with a regular expression:
$doesMatch = preg_match('~^http://www.example.com/embed/[^?]{0,11}\?rel=0$~', $link);
This does specifically what you're asking for.
For removing tags please see strip_tags or use a HTML parser to do it, which will also help you to get the link value more properly.
In a similar question/answer I posted some example code how to use strip_tags and SimpleXMLElement together: Extract all the text and img tags from HTML in PHP.
First of all, there is built-in function in PHP that strips tags for you: http://php.net/manual/en/function.strip-tags.php no need to use slow regex here.
Steps you'll need to solve your problem:
Parse this text as DomDocument
Get iframe node from it
Get src attribute from iframe and parse it with parse_url
Now you can perform easy checks on all components returned by parse_url
Happy coding
Here is my code: http://jsfiddle.net/zEXrq/8/
Also I need to check <li> tag and <h3> tags.
Is it possible to check other words into those tags like <a onclick="word_desc(23); classChange(1, 1);" id="txtid1" class="sel25">user key</a> - here I want to replace only "user key", but when I put "a" to replace all "a" words including <a> tag and other words inside tags are replaced. How to solve this?
Your problem is that to do this with regexes you'll need a ridiculously complex regex to ignore occurrences that appear within tags. Instead, you can convert the string to an HTML DOM tree and only perform the highlighting on text nodes.
Now, you can't just replace a text node's nodeValue with HTML, it won't work. You'll need to modify the nodeValue, and insert new nodes where required. This may sound confusing, so I've jsFiddled it for you:
http://jsfiddle.net/zEXrq/29/
i hope you need something like this. Because i couldnt understand your problem fully. So check this and addd comment then i can fix it.
http://jsfiddle.net/zEXrq/28/
I am trying to index some content from a series of .html's that share the same format.
So I get a lot of lines like this: <a href="meh">[18] blah blah blah < a...
And the idea is to extract the number (18) and the text next to it (blah...). Furthermore, I know that every qualifying line will start with "> and end with either <a or </p. The issue stems from the need to keep all other htmHTML tags as part of the text (<i>, <u>, etc.).
So then I have something like this:
$docString = file_get_contents("http://whatever.com/some.htm");
$regex="/\">\ [(.*?)\ ] (<\/a>)(.) *?(<)/";
preg_match_all($regex,$docString,$match);
Let's look at $regex for a sec. Ignore it's spaces, I just put them here because else some characters disappear. I specify that it will start with ">. Then I do the numbers inside the [] thing. Then I single out the </a>. So far so good.
At the end, I do a (.)*?(<). This is the turning point. By leaving the last bit, (<) like that, The text will be interrupted when an underline or italics tag is found. However, if I put (<a|</p) the resulting array ends up empty. I've tried changing that to only (<a), but it seems that 2 characters mess up the whole ting.
What can I do? I've been struggling with this all day.
PHP Tidy is your friend. Don't use regexes.
Something like /">\[(.*)\](.*)(?:<(?:a|\/p))/ seems to work fine for given your example and description. Perhaps adding non-capturing subpatterns does it? Please provide a counterexample wherein this doesn't work for you.
Though I agree that RegEx isn't a parser, it sounds like what you're looking for is part of a regularly behaved string - which is exactly what RegEx is strong at.
As you've found, using a regex to parse HTML is not very easy. This is because HTML is not particularly regular.
I suggest using an XML parser such as PHP's DomDocument.
Create an object, then use the loadHTMLFile method to open the file. Extract your a tags with getElementsByTagName, and then extract the content as the NodeValue property.
It might look like
// Create a DomDocument object
$html = new DOMDocument();
// Load the url's contents into the DOM
$html->loadHTMLFile("http://whatever.com/some.htm");
// make an array to hold the text
$anchors = array();
//Loop through the a tags and store them in an array
foreach($html->getElementsByTagName('a') as $link) {
$anchors[] = $link->nodeValue;
}
One alternative to this style of XML/HTML parser is phpquery. The documentation on their page should do a good job of explaining how to extract the tags. If you know jQuery, the interface may seem more natural.
If I had the following HTML:
<li> Thisislink1</li>
<li> Thisisanotherlink</li>
<li> Onemorelink</li>
Where each link will be different in length and value.
How can I search for the values inside the link (IE: Thisislink1, Thisisanotherlink and Onemorelink) with a search phrase, say 'another'. So in this example, only 'Thisisanotherlink' would be returned, but if I changed the search phrase to 'link', then all 3 values will be returned.
Don't use regex. Use DOMDocument.
/\w*another\w*/
This needs to be done in two passes:
Extract the text from all links in the document. XSL or XPath should we workable for this purpose. As you extract text, keep a copy of the DOM around so you can attach information to it and the text, telling you where the text is extracted from (if you are going to need this info later, you might not). As an alternative, just keep attach the contents of the href attribute to the text.
Be sure to extract all the text you need (e.g. title attributes, or alt text of <a href><img alt></a> type constructs.
Search the extracted text for the phrase you are looking for.
(Optional) use the information you set earlier to map back to the DOM to figure out what element you gathered the text from, and highlight it. If you extracted the href attribute, you could just make a new link using this and the matching text.