UTF Regex with preg_match in PHP

UTF Regex with preg_match in PHP - php

I need a regeular expression for german words with ä,ü etc.
When I test this regex on this website https://regex101.com/
/^\p{L}+$/u
all is fine, but on my server I upload a CSV and want to parse the words.
When I call with the word "Benedikt"
preg_match("/^[\p{L}]+$/u", $attributes[0])
I get false. The encoding of the CSV is UTF-8, when I convert it to ANSI, all is good but the ä,ü etc. is not shown correctly, so I think I should convert it to UTF-8.
But why is it returning false?

The problem occurs because your csv file starts with a UTF-8 BOM. If you remove this, the regex works perfectly. I have confirmed it with this code:
<html>
<head>
<meta charset="utf-8" />
</head>
<body>
<?php
function remove_utf8_bom($text)
{
$bom = pack('H*','EFBBBF');
$text = preg_replace("/^$bom/", '', $text);
return $text;
}
$csvContents = remove_utf8_bom(file_get_contents('udfser_new.csv'));
$lines = str_getcsv($csvContents, "\n"); //parse the rows
foreach ($lines as &$row) {
$row = str_getcsv($row, ";");
$firstName = $row[0];
$lastName = $row[1];
echo 'First name: ' . $firstName . ' - Matches regex: ' . (preg_match("/^[\p{L}]+$/u", $firstName) ? 'yes' : 'no') . '<br>';
echo 'Last name: ' . $lastName . ' - Matches regex: ' . (preg_match("/^[\p{L}]+$/u", $lastName) ? 'yes' : 'no') . '<br>';
}
?>
</body>
</html>
The regex match the text successfully, and the ü in Glückmann is shown correctly on the page.

preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.
http://php.net/manual/en/function.preg-match.php

Related

Remove space from php url $url

user submit a from and It also generate a link. When user put a space in their names or phone. Space also appear in url. I want to replace this space from url to nospace.
My code is here
elseif($company == 5) {
$ip = explode(',', $_SERVER['HTTP_X_FORWARDED_FOR'])[0];
$sub = explode('|',$s3);
$str = ltrim($newphonecode, '+');
$num = $str.''.$newphnnum;
$url = 'http://go.247traffic.com/api/forextb/?api_username=allconverts&
api_password=MegaStart21&module=Customer&command=add&
firstname='.$firstname.'&lastname='.$lastname.'&email='.urlencode($emaillead).'&
phone='.$num.'&password='.$password.'&country='.$country.'&language='.$language.'&
campaignid='.urlencode($s).'&
subCampaign='.urlencode($sub[1]).''.htmlspecialchars('&currency').'='.
$currency.'&ip='.$ip;

You can use str_replace() for this purpose:
$str = " Hello World";
echo "With space: " . $str;
echo "<br>";
echo "Without space: " . str_replace(' ', '', $str);
Output will look like:
with space: Hello World
without space: HelloWorld

How can I fix CURL quotes strpos() problem?

Firstly, sorry my bad English.
I'm taking text from a site with Curl. After that, I am searching in text some words with strpos().
But; when text coming with quotes, my function is not working. For example. Text is coming with curl without quotes;
$text = "This is my text.";
$intext = "This is";
if (strpos($text, $intext) !== false) {
echo "OK";
}
Okey, page give me "OK", now my codes working.
But, when text coming with quotes like this:
$text = "This's my text.";
$intext = "This's my";
if (strpos($text, $intext) !== false) {
echo "OK";
} else {
echo "NO";
}
The page gives me: "NO"!
Why? I think the quotation mark data from the website is different. How can I fix this problem? I need to compare without clearing punctuation.

I fixed problem with this code;
$str = str_replace(' ', ' ', $text);
$str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
$str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
$str = html_entity_decode($str);
$str = htmlspecialchars_decode($str);
$text = strip_tags($str);
Thanks.

PHP trim unexpected behaviour

I am using the following function in PHP to trim some unwanted characters.
$inputString = "आनन्द मठ";
trim(html_entity_decode($inputString), " \t\n\r\0\x0B\xC2\xA0");
The above code is working fine for all cases but in one input string (आनन्द मठ) it is converting it to आनन्द म�. It has a unwanted �. Also happening for परेटो- श्रेष्ठ converted to परेटो- श्रेष्�.

trim()
This function use iso-8859 encoding.
you must use UTF8 (Unicode) function. Try this function
function mb_trim($string, $charlist='\\\\s', $ltrim=true, $rtrim=true)
{
$both_ends = $ltrim && $rtrim;
$char_class_inner = preg_replace(
array( '/[\^\-\]\\\]/S', '/\\\{4}/S' ),
array( '\\\\\\0', '\\' ),
$charlist
);
$work_horse = '[' . $char_class_inner . ']+';
$ltrim && $left_pattern = '^' . $work_horse;
$rtrim && $right_pattern = $work_horse . '$';
if($both_ends)
{
$pattern_middle = $left_pattern . '|' . $right_pattern;
}
elseif($ltrim)
{
$pattern_middle = $left_pattern;
}
else
{
$pattern_middle = $right_pattern;
}
return preg_replace("/$pattern_middle/usSD", '', $string) );
}

Add http header in your php like
header("Content-Type: text/html; charset=ISO-8859-1");
or put the encoding in a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

PHP using UTF8 characters in URL, url encoding fails

In my PHP script I try to send utf8 characters to the google translate website for them to send me a translation of the text, but this doesn't work for UTF8 characters such as chinese, arabic and russian and I can't figure out why. If I try to translate 'как дела' to english I could use this link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=как дела
And it would return this: [[["how are you","как дела",,,1]],,"ru"]
A fine translation, exactly what I wanted, but if I try to recreate it in PHP I do this (I used bytes in the beginning because my future script will use bytes as starting point):
<?php
$bytes = array(1082,1072,1082,32,1076,1077,1083,1072); // bytes of: как дела
$str = "";
for($i = 0; $i < count($bytes); ++$i) {
$str .= json_decode('"\u' . '0' . strtoupper(dechex($bytes[$i])) . '"'); // returns string: как дела
}
$from = 'ru';
$to = 'en';
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $str;
$call = fopen($url,"r");
$contents = fread($call,2048);
print $contents;
?>
And it outputs: [[["RєR RєRґRμR ° \"° F","РєР°РєРґРµР»Р°",,,0]],,"ru"]
The output doesn't make sense, it appears that my PHP script send the string 'РєР°РєРґРµР»Р°' to translate to english for me. I read something about making UTF-8 characters readable for google in a URI (or url). It says I should transfer my bytes to UTF-8 code units and put them in my url. I didn't yet figure out how to transfer bytes to UTF-8 code units, but I first wanted to try if it worked. I started by converting my text 'как дела' to code units (with percents for URL) to test it myself. This resulted in the following link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0
And when tested in browser it returns: [[["how are you","как дела",,,1]],,"ru"]
Again a fine translation, it appears it works so I tried to implement it in my script with the following code:
<?php
$from = 'ru';
$to = 'en';
$text = "%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0"; // code units of: как дела
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$call = fopen($url,"r");
$contents = fread($call,2048);
print $contents;
?>
This script outputs: [[["RєR Rє RґRμR ° \"° F","РєР°Рє РґРµР»Р°",,,0]],,"ru"]
Again my script doesn't output what I want and what I get when I test these URL's in my own browser. I can't figure what I'm doing wrong and why google responds with a mess up of characters if I use the link in my PHP file.
Does someone know how to get the output I want? Thanks in advance!
Updated code to set strings in UTF-8, (not working)
I added a lot of settings at the top of the PHP file to make sure everything is in UTF8 format. Also I added a mb_convert_encoding halfway but the output keeps being wrong. The fopen function doesn't send the right UTF-8 string to google.
Output I get:
URL: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA%20%D0%B4%D0%B5%D0%BB%D0%B0
Encoding: ASCII
File contents: [[["RєR Rє RґRμR ° \"° F","РєР°Рє РґРµР»Р°",,,0]],,"ru"]
Code I use:
<?php
header('Content-Type: text/html; charset=utf-8');
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
$from = 'ru';
$to = 'en';
$text = rawurlencode('как дела');
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$url = mb_convert_encoding($url, "UTF-8", "ASCII");
$call = fopen($url,"r");
$contents = fread($call,2048);
print 'URL: ' . $url . '<br>';
print 'Encoding: ' . mb_detect_encoding($url) . '<br>';;
print 'File contents: ' . $contents;
?>

Solved! I got the hint from another not from these forums to look at this stackoverflow post about setting a user agent. After some more research I found that this answer was the solution to my problem. Now everything works fine!

Syntax error: Unexpected '/' error

I'm using fopen() but I get this error on execution.
Parse error: syntax error, unexpected '/' in /home/furtherpath.. on line 7
The line 7 is:
/home/a11*****/public_html/rishi/rishi_someone_php.php = fopen("rishi_someone_php.php","r");
I know I should place a file handler instead but that also doesn't seem to work and gives the same error.
Can't figure out why.
What I'm doing is creating a html page using php.:
$html="<html> \n <head> \n <title>".$fn."</title> \n </head> \n <body> \n <?php \n $file=fopen('".$full."','r'); \n while(!(feof($file))) \n { \n echo htmlspecialchars(fgets($file)).'<br/>'; \n } \n fclose($file); \n ?> \n </body> \n </html>";
Thanks in advance.

Here is a cleaned up version of your script. Try to understand it. I'e combined the string at each html element just to be clear. There is no reason for php tags within php tags. In your code at this line $file=fopen('".$full."','r');
You are trying to quote a variable of type string. That is not necessary with variables. PHP will know that it is a string, so don't do that. Only use quotes when a string literal is being passed into the function as an arguments.
$html = "<html>\n<head><title>". $fn."</title>\n</head><body>\n";
$fp = fopen($full, "r");
if ($fp) {
while (($line = fgets($fp)) !== false) {
$html .= htmlspecialchars($line, ENT_QUOTES, 'UTF-8');
$html .= "<br />";
}
} else {
$html .= "No data";
$html .= "<br />";
}
fclose($fp);
$html .= "</body>\n</html>\n";
echo $html;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

UTF Regex with preg_match in PHP - php

preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred. http://php.net/manual/en/function.preg-match.php

Related

Remove space from php url $url

How can I fix CURL quotes strpos() problem?

PHP trim unexpected behaviour

PHP using UTF8 characters in URL, url encoding fails

Syntax error: Unexpected '/' error

Categories

Resources