I am parsing big file with a lot of data with php. I got stuck on one case:
If in my string there is a number in curly brackets, I need to add some text first.
Example:
{4316} Test
Should be:
{=ATTRVAL("4316")} Test
Or in the middle of the string:
Some random text {2323} and {3232} I got here.
Should be:
Some random text {=ATTRVAL("2323")} and {=ATTRVAL("3232")} I got here.
I tried so far with a lot of string functions, but no luck at this time.
public static function parseStringWithAttributeValue($attributeValue)
{
preg_match_all('!\d+!', $attributeValue, $matches);
$string = '';
foreach ($matches as $match)
{
// string .= $match
}
return $string;
}
I tried first to extract only numbers, and then create the new text, but that is wrong logic. Ideal would be something with preg_replace if it is possible, but I had no luck so far.
I also tried str_replace, but I guess my knowledge only goes this far.
If anyone has idea what approach to take, I would be happy to get any suggestion.
Try this:
public static function parseStringWithAttributeValue($attributeValue){
return preg_replace('/{(\d+)}/i', '{=ATTRVAL("$1")}', $attributeValue);
}
Related
I want to get all the problems solved by a user on a website by using regular expression to get only the problem code through it. For example if PROBLEM is the HTML code I want to get only the PROBLEM from it. This is the php function which I wrote to accomplish this.
public function filter($s, $u) {
//$u -> username
//$s -> string containing html code
$reg= "/[^<a href=\"\/status\/(?:[A-Z]|[0-9]|\_|[a-z]|\.)*\"$u]/";
$solved = preg_split($reg, $s, -1, PREG_SPLIT_NO_EMPTY)
return $solved[0];
}
The regular expression doesn't seem to be correct and I am only getting /[^ when I print $reg. Also, I am not sure if preg_split() is the right function to return to do this. Please help.
The following works for me. Note that instead of searching for the PROBLEM text in the link as you were attempting, I search for the PROBLEM text that is being highlighted. The $u parameter no longer seems necessary.
public function filter($s) {
// $s set to 'PROBLEM';
$reg = '#(?P<problem>.+)#i';
preg_match($reg, $s, $matches);
return $matches['problem']
}
The outer brackets in your regular expression are out of place.
I am trying to grab what is the h4 text
$regex = '/<h4>([A-Za-z0-9\,\.])/';
I am just getting the first letter back, I cannot figure out how to use * to keep grabbing everything to the first < character.
I have made countless attempts and know I am overlooking something simple.
So I was making that much harder than I needed to, the following works:
$regex = '/<h4>.*?<\/h4>/';
If you can trust that grabbing all characters up to the first < is a good enough rule then use this:
$regex = '/<h4>([^<]*?)</';
Of course that definition will only grab 'The ' from <h4>The <b>Best</b> Book</h4> You can fix that be changing it to:
$regex = '/<h4>(.*?)<\/h4>/';
Which will grab everything between a <h4> and a </h4>, but still isn't perfect because anything like <h4 > or <h4 style="..."> will break it, along with a million other valid HTML examples. If you know that the contents won't have any < though, and you know your tag will always be exactly <h4> the first one works well enough for your situation.
If your situation is more complex you will want to use something like PHP's DOM extension (DOMDocument) which is meant for parsing HTML and XML, since neither are regular languages and cannot be parsed error free with regex.
You can use the below function to accomplish this task.
**function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches;
}**
In the first parameter you have to pass the complete string, and in the second parameter you have to pass the tagname ("h4")..
Calling all the PHP helpers out there.
So basically I would like to give the function preg_match a variable that can contain a couple thousand lines of code) and have it search using a wildcard + strings either side of the widlcard.
For example I would like to search for strings that look like this <a href="*.pdf">
I would then like the function to return every match (along with the html shiz around the wildcard, this is to catch any directory structures too) in an array that I can loop through using a foreach(){} loop.
I'm guessing this is possible, would anyone have the time to help me with this?
I've check through all the preg_match lit' and through the answers on here, but I can't seem to get the patterns correct. Thanks in advance.
Peace out.
unset($matches);
preg_match_all('/<a href="[^"]+\.pdf">/',$text,$matches);
foreach ($matches as $match)
{
$shiz = $match[0];
// Your code here ...
}
i reposted this question because i didn't find a good answer.
i have a string which can contains text with urls.
i want a function to strip all urls from this string and just let the text.
by example the string can contains like this :
1) hey take a look here : http://xxx.xxx/545df5 this is nice!
2) hey take a look here : http://www.xxx.xxx/545df5 this is nice!
3) hey take a look here : xxx.xxx/545df5 this is nice!
4) hey take a look here : www.xxx.xxx/545df5 this is nice!
Thanks
Regular expression for URL and how to use regular expression with php should help you.
What you really need is a solid regex to find urls in a string and you can preg_replace that pattern with nothing. I can tell you though that tracking down a regex like that is not easy. Depending on the variations in the urls you're looking for (i.e. http:// vs https:// vs ftp://) You could run into real trouble trying to account for all that.
Here is a page that I found to be a good start though.
Regex is the way to go as was discussed prior. Finding one isn't that terribly hard (google: url regex pattern) One example returned is here
http://www.geekzilla.co.uk/View2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm
I would also recommend you test your regex using one of the many fine online regex testers. My favorite (for non-java) is
http://www.regextester.com/
This function should do it(assuming your strings are seperated by space " "):
function isValidURL($url) {
return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $url);
}
function cleanUpUrls($urls) {
$urlArray = explode(' ',$urls);
$resultArray = array();
foreach ($urlArray as $url) {
if(!isValidURL($url)) {
$resultArray[] = $url;
}
}
return implode(' ',$resultArray);
}
Consider this string
hello awesome <a href="" rel="external" title="so awesome is cool"> stuff stuff
What regex could I use to match any occurence of awesome which doesn't appear within the title attribute of the anchor?
So far, this is what I've came up with (it doesn't work sadly)
/[^."]*(awesome)[^."]*/i
Edit
I took Alan M's advice and used a regex to capture every word and send it to a callback. Thanks Alan M for your advice. Here is my final code.
$plantDetails = end($this->_model->getPlantById($plantId));
$botany = new Botany_Model();
$this->_botanyWords = $botany->getArray();
foreach($plantDetails as $key=>$detail) {
$detail = preg_replace_callback('/\b[a-z]+\b/iU', array($this, '_processBotanyWords'), $detail);
$plantDetails[$key] = $detail;
}
And the _processBotanyWords()...
private function _processBotanyWords($match) {
$botanyWords = $this->_botanyWords;
$word = $match[0];
if (array_key_exists($word, $botanyWords)) {
return '' . $word . '';
} else {
return $word;
}
}
Hope this well help someone else some day! Thanks again for all your answers.
This subject comes up pretty much every day here and basically the issue is this: you shouldn't be using regular expressions to parse or alter HTML (or XML). That's what HTML/XML parsers are for. The above problem is just one of the issues you'll face. You may get something that mostly works but there'll still be corner cases where it doesn't.
Just use an HTML parser.
Asssuming this is related to the question you posted and deleted a little while ago (that was you, wasn't it?), it's your fundamental approach that's wrong. You said you were generating these HTML links yourself by replacing words from a list of keywords. The trouble is that keywords farther down the list sometimes appear in the generated title attributes and get replaced by mistake--and now you're trying to fix the mistakes.
The underlying problem is that you're replacing each keyword using a separate call to preg_replace, effectively processing the entire text over and over again. What you should do is process the text once, matching every single word and looking it up in your list of keywords; if it's on the list, replace it. I'm not set up to write/test PHP code, but you probably want to use preg_replace_callback:
$text = preg_replace_callback('/\b[A-Za-z]+\b/', "the_callback", $text);
"the_callback" is the name of a function that looks up the word and, if it's in the list, generates the appropriate link; otherwise it returns the matched word. It may sound inefficient, processing every word like this, but in fact it's a great deal more efficient than your original approach.
Sure, using a parsing library is the industrial-strength solution, but we all have times were we just want to write something in 10 seconds and be done. Next time you want to process the meaty text of a page, ignoring tags, try just run your input through strip_tags first. This way you will get only the plain, visible text and your regex powers will again reign supreme.
This is so horrible I hesitate to post it, but if you want a quick hack, reverse the problem--instead of finding the stuff that isn't X, find the stuff that IS, change it, do the thing and change it back.
This is assuming you're trying to change awesome (to "wonderful"). If you're doing something else, adjust accordingly.
$string = 'Awesome is the man who <b>awesome</b> does and awesome is.';
$string = preg_replace('#(title\s*=\s*\"[^"]*?)awesome#is', "$1PIGDOG", $string);
$string = preg_replace('#awesome#is', 'wonderful', $string);
$string = preg_replace('#pigdog#is', 'awesome', $string);
Don't vote me down. I know it's hack.