I have those two lines of code written in python i want to convert them to php, please help i don't know how to do this in php.
#code in python:
vowels= re.compile(ur'[\u064B-\u0652]')
newstr = vowels.sub('', str)
thank you
I think this is equivalent:
<?php
$vowels ='/[\x{064B}-\x{0652}]/u';
$newstr = preg_replace($vowels,"",$str);
The string should be UTF-8 encoded.
I believe what you want is PHP's preg_replace(). I have no idea how PHP handles Unicode so I'm unsure if you can just take the Python pattern as-is and use it with PHP, but the syntax for using preg_replace() would be something like this:
<?php
$mystring = "jask;lfjalksdf"
$pattern = "/\u064B-\u0652/";
$replacement = "";
echo preg_replace($pattern, $replacement, $mystring);
?>
re.compile() means its compiling a regular expression. Look at this for some help.
The sub function is essentially adding str to the beginning of the regular expression.
More info on python regular expressions can be found here
Related
I am trying to parse a badly formed html table:
A couple of lines of this are:
Food:</b> Yes<b><br>
Pool: </b>Beach<b></b><b><br>
Centre:</b> Yes<b><br>
After spending a lot of time on this with Xpath, I think it is probably better to split the above text into lines use preg_split and parse from there.
The pattern I think would work uses:
<\b><\br>*: <\b>
my code is as follows:
$pattern='</b></br>*:</b>';
$pattern=preg_quote($pattern,'#');
$chars = preg_split($pattern, $output);
print_r($chars);
I am getting the following error:
Delimiter must not be alphanumeric or backslash
What I am doing wrong?
Try this:
$pattern='</b></br>*:</b>';
$pattern=preg_quote($pattern,'#');
$chars = preg_split('#'.$pattern.'#', $output);
print_r($chars);
The preg_quote function just makes it safely escaped, it doesn't actually add the delimiters for you.
As other people will surely point out, using regular expressions is not a good way to parse HTML :)
Your regular expression is also not going to match what you hope. Here's a version that will probably work for your input:
$in = " Pool: </b>Beach<b></b><b><br>";
$out = explode(':', strip_tags($in));
$key = trim($out[0]);
$value = trim($out[1]);
echo "$key = $value\n";
This removes all the HTML, then splits on the colon, and then removes any surrounding whitespace.
Your pattern needs to start and end with a delimiter; looks like you're using # if I'm reading this correctly, so you should have $pattern = '#</b></br>.*:</b>#';.
Also, you're mixing things up; * is not a simple wildcard in regex. If you mean "any number of any characters," the pattern you need is .*. I've included this above.
I wrote a regex in RegExr to tackle the following string:
<?php _on*/4353452f43f43f46 xx46 _off*/ ?>
This is the Regex code:
(.*<?php.*)(.*_on.*)(.*_off.*)(.*?>)
Which is working fine here:
http://regexr.com?31ptt
But it doesn't work with PHP, I get weird errors like: "Unknown modifier '<'", etc.
What do I need to do to convert this to work with PHP?
This is my php code:
$virusstring = '(.*/<?php.*)(.*_on.*)(.*_off.*)(.*?>)';
if(preg_match($virusstring,$myfile)) {
$fixed = preg_replace($virusstring,'',$myfile);
$blah = file_put_contents($item, $fixed);
}
$myfile is just taken from the infected file that is being scanned.
Your regular expression is missing delimiters. You need to add delimiters or PHP will assume your opening ( is a delimiter:
/(.*<\?php.*)(.*_on.*)(.*_off.*)(.*\?>)/
Also, ? is a quantifier, matching 0 or 1 of the previous character. You need to escape it:
(.*<\?php.*)(.*_on.*)(.*_off.*)(.*\?>)
This seems to work fine
preg_match("/(.*<\\?php.*)(.*_on.*)(.*_off.*)(.*\\?>)/us", $searchText)
I want to match matching tags like <tag>...</tag>. I tried the regex
~<([^>]+)>.*?</\1>~
but this fails. The expression worked when I used the exact text inside the angle brackets, i.e,
~<(tag)>.*?</tag>~
works, but even
~<(tag)>.*?</\1>~
fails.
I'm assuming that the back reference is not working here.
Can someone help me out please. Thanks
PS: I'm not using this to parse HTML. I know I shouldn't.
You didn't show your PHP code, but I surmise you have your regex in double quotes. If so then the backreference \1 actually is converted into an ASCII character ☺ before it reaches PCRE. (All \123 sequences are interpreted as C-string octal escapes there.)
It worked for me...
$str = '<a></a>';
var_dump(preg_match('~<([^>]+)>.*?</\1>~', $str)); // int(1)
CodePad.
Also, have you considered an XML parser? Otherwise it won't like a piece of HTML like this...
<a title="Is 4 > 6?"></a>
CodePad.
Apart from the fact that it's not always a good idea to try and match markup languages using regex, your regex looks OK. Maybe you're using it wrong?
if (preg_match('~<([^>]+)>.*?</\1>~', $subject, $regs)) {
$result = $regs[0];
} else {
$result = "";
}
should work.
Use single quotes in the pattern
preg_match_all('/(sens|respons)e and \1ibility/', "sense and sensibility", $matches);
print_r($matches);
I decided to, for fun, make something similar to markdown. With my small experiences with Regular Expressions in the past, I know how extremely powerful they are, so they will be what I need.
So, if I have this string:
Hello **bold** world
How can I use preg_replace to convert that to:
Hello <b>bold</b> world
I assume something like this?
$input = "Hello **bold** world";
$output = preg_replace("/(\*\*).*?(\*\*/)", "<b></b>", $input);
Close:
$input = "Hello **bold** world";
$output = preg_replace("/\*\*(.*?)\*\*/", "<b>$1</b>", $input);
I believe there is a PHP package for rendering Markdown. Rather than rolling your own, try using an existing set of code that's been written and tested.
Mmm I guess this could work
$output = preg_replace('/\*\*(.*?)\*\*/', '<b>$1</b>', $input);
You find all sequences **something** and then you substitute the entire sequence found with the bold tag and inside it (the $1) the first captured group (the brackets in the expression).
$output = preg_replace("/\*\*(.*?)\*\*/", "<b>$1</b>", $input);
OK,I know that I should use a DOM parser, but this is to stub out some code that's a proof of concept for a later feature, so I want to quickly get some functionality on a limited set of test code.
I'm trying to strip the width and height attributes of chunks HTML, in other words, replace
width="number" height="number"
with a blank string.
The function I'm trying to write looks like this at the moment:
function remove_img_dimensions($string,$iphone) {
$pattern = "width=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
$pattern = "height=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
return $string;
}
But that doesn't work.
How do I make that work?
PHP is unique among the major languages in that, although regexes are specified in the form of string literals like in Python, Java and C#, you also have to use regex delimiters like in Perl, JavaScript and Ruby.
Be aware, too, that you can use single-quotes instead of double-quotes to reduce the need to escape characters like double-quotes and backslashes. It's a good habit to get into, because the escaping rules for double-quoted strings can be surprising.
Finally, you can combine your two replacements into one by means of a simple alternation:
$pattern = '/(width|height)="[0-9]*"/i';
Your pattern needs the start/end pattern character. Like this:
$pattern = "/height=\"[0-9]*\"/";
$string = preg_replace($pattern, "", $string);
"/" is the usual character, but most characters would work ("|pattern|","#pattern#",whatever).
I think you're missing the parentheses (which can be //, || or various other pairs of characters) that need to surround a regular expression in the string. Try changing your $pattern assignments to this form:
$pattern = "/width=\"[0-9]*\"/";
...if you want to be able to do a case-insensitive comparison, add an 'i' at the end of the string, thus:
$pattern = "/width=\"[0-9]*\"/i";
Hope this helps!
David