Encode the url including hyphen(-) and dot(.) in php - php

I need the encoded URL for processing in one of the API, but it requires the full encoded URL. For example, the URL from:
http://test.site-raj.co/999999?lpp=1&px2=IjN
has to become an encoded URL, like:
http%3a%2f%test%site%2draj%2eco%2f999999%3flpp%3d1%26px2%3dIjN
I need every symbol to be encoded, even the dot(.) and hyphen(-) like above.

Try this. Inside a function maybe if you are using it more than once...
$str = 'http://test.site.co/999999?lpp=1&p---x2=IjN';
$str = urlencode($str);
$str = str_replace('.', '%2E', $str);
$str = str_replace('-', '%2D', $str);
echo $str;

This will encode all characters that are not plain letters or numbers. You can still decode this with the standard urldecode or rawurldecode:
function urlencodeall($x) {
$out = '';
for ($i = 0; isset($x[$i]); $i++) {
$c = $x[$i];
if (!ctype_alnum($c)) $c = '%' . sprintf('%02X', ord($c));
$out .= $c;
}
return $out;
}

Why don't you use rawurlencode
for example rawurlencode("http://test.site-raj.co/999999?lpp=1&px2=IjN")

Related

How can I split html value and normal string into different array in php?

Say I have string such as below:
"b<a=2<sup>2</sup>"
Actually its a formula. I need to display this formula on webpage but after b string is hiding because its considered as broken anchor tag. I tried with htmlspecialchars method but it returns complete string as plain text. I am trying with some regex but I can get only text between some tags.
UPDATE:
This seems to work with this formula:
"(c<a) = (b<a) = 2<sup>2</sup>"
And even with this formula:
"b<a=2<sup>2</sup>"
HERE'S THE MAGIC:
<?php
$_string = "b<a=2<sup>2</sup>";
$string = "(c<a) = (b<a) = 2<sup>2</sup>";
$open_sup = strpos($string,"<sup>");
$close_sup = strpos($string,"</sup>");
$chars_array = str_split($string);
foreach($chars_array as $index => $char)
{
if($index != $open_sup && $index != $close_sup)
{
if($char == "<")
{
echo "<";
}
else{
echo $char;
}
}
else{
echo $char;
}
}
OLD SOLUTION (DOESN'T WORK)
Maybe this can help:
I've tried to backslash chars, but it doesn't work as expected.
Then i've tried this one:
<?php
$string = "b&lta=2<sup>2</sup>";
echo $string;
?>
Using &lt html entity it seems to work if i understood your problem...
Let me know
Probably you can give spaces such as :
b < a = 2<sup>2</sup>
It does not disappear the tag and looks much more understanding....
You could try this regex approach, which should skip elements.
$regex = '/<(.*?)\h*.*>.+<\/\1>(*SKIP)(*FAIL)|(<|>)/';
$string = 'b<a=2<sup>2</sup>';
$string = preg_replace_callback($regex, function($match) {
return htmlentities($match[2]);
}, $string);
echo $string;
Output:
b<a=2<sup>2</sup>
PHP Demo: https://eval.in/507605
Regex101: https://regex101.com/r/kD0iM0/1

i want text and numeric part from the string in php

i have one string
$str ='california 94063';
now i want california and 94063 both in diferent variable.
string can be anything
Thanks in advance....
How about
$strings = explode(' ', $str);
Assuming that your string has ' ' as a separator.
Then, if you want to find the numeric entries of the $strings array, you can use is_numeric function.
Do like this
list($str1,$str2)=explode(' ',$str);
echo $str2;
If your string layout is always the same (say: follows a given format) then I'd use sscanf (http://www.php.net/manual/en/function.sscanf.php).
list($str, $number) = sscanf('california 94063, "%str %d");
<?php
$str ='california 94063';
$x = preg_match('(([a-zA-Z]*) ([0-9]*))',$str, $r);
echo 'String Part='. $r[1];
echo "<br />";
echo 'Number Part='.$r[2];
?>
If text pattern can be changed then I found this solution
Source ::
How to separate letters and digits from a string in php
<?php
$string="94063 california";
$chars = '';
$nums = '';
for ($index=0;$index<strlen($string);$index++) {
if(isNumber($string[$index]))
$nums .= $string[$index];
else
$chars .= $string[$index];
}
echo "Chars: -".trim($chars)."-<br>Nums: -".trim($nums)."-";
function isNumber($c) {
return preg_match('/[0-9]/', $c);
}
?>

Replace character's position in a string

In PHP, how can you replace the second and third character of a string with an X so string would become sXXing?
The string's length would be fixed at six characters.
Thanks
It depends on what you are doing.
In most cases, you will use :
$string = "string";
$string[1] = "X";
$string[2] = "X";
This will sets $string to "sXXing", as well as
substr_replace('string', 'XX', 1, 2);
But if you want a prefect way to do such a cut, you should be aware of encodings.
If your $string is 我很喜欢重庆, your output will be "�XX很喜欢" instead of "我XX欢重庆".
A "perfect" way to avoid encoding problems is to use the PHP MultiByte String extension.
And a custom mb_substr_replace because it has not been already implemented :
function mb_substr_replace($output, $replace, $posOpen, $posClose) {
return mb_substr($output, 0, $posOpen) . $replace . mb_substr($output, $posClose + 1);
}
Then, code :
echo mb_substr_replace('我很喜欢重庆', 'XX', 1, 2);
will show you 我XX欢重庆.
Simple:
<?php
$str = "string";
$str[1] = $str[2] = "X";
echo $str;
?>
For replacing, use function
$str = 'bar';
$str[1] = 'A';
echo $str; // prints bAr
or you could use the library function substr_replace as:
$str = substr_replace($str,$char,$pos,1);
similarly for 3rd position
function mb_substr_replace($string, $replacement, $start, $length=0)
{
return mb_substr($string, 0, $start) . $replacement . mb_substr($string, $start+$length);
}
same as above, but standardized to be more like substr_replace (-substr- functions usually take length, not end position)

Get hexcode of html entities

I have a string as "€".
I want to convert it to hex to get the value as "\u20AC" so that I can send it to flash.
Same for all currency symbol..
£ -> \u00A3
&dollar; -> \u0024
etc
First, note that &dollar; is not a known entity in HTML 4.01. It is, however, in HTML 5, and, in PHP 5.4, you can call html_entity_decode with ENT_QUOTES | ENT_HTML5 to decode it.
You have to decode the entity and only then convert it:
//assumes $str is in UTF-8 (or ASCII)
function foo($str) {
$dec = html_entity_decode($str, ENT_QUOTES, "UTF-8");
//convert to UTF-16BE
$enc = mb_convert_encoding($dec, "UTF-16BE", "UTF-8");
$out = "";
foreach (str_split($enc, 2) as $f) {
$out .= "\\u" . sprintf("%04X", ord($f[0]) << 8 | ord($f[1]));
}
return $out;
}
If you want to replace only the entities, you can use preg_replace_callback to match the entities and then use foo as a callback.
function repl_only_ent($str) {
return preg_replace_callback('/&[^;]+;/',
function($m) { return foo($m[0]); },
$str);
}
echo repl_only_ent("€foobar ´");
gives:
\u20ACfoobar \u00B4
You might try the following function for string to hex conversion:
function strToHex($string) {
$hex='';
for ($i=0; $i < strlen($string); $i++) {
$hex .= dechex(ord($string[$i]));
}
return $hex;
}
From Greg Winiarski which is the fourth hit on Google.
In combination with html_entity_decode(). So something like this:
$currency_symbol = "€";
$hex = strToHex(html_entity_decode($currency_symbol));
This code is untested and therefore may require further modification to return the exact result you require

How to remove html special chars? [duplicate]

This question already has an answer here:
Convert HTML entities and special characters to UTF8 text in PHP
(1 answer)
Closed 9 months ago.
I am creating a RSS feed file for my application in which I want to remove HTML tags, which is done by strip_tags. But strip_tags is not removing HTML special code chars:
& ©
etc.
Please tell me any function which I can use to remove these special code chars from my string.
Either decode them using html_entity_decode or remove them using preg_replace:
$Content = preg_replace("/&#?[a-z0-9]+;/i","",$Content);
(From here)
EDIT: Alternative according to Jacco's comment
might be nice to replace the '+' with
{2,8} or something. This will limit
the chance of replacing entire
sentences when an unencoded '&' is
present.
$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content);
Use html_entity_decode to convert HTML entities.
You'll need to set charset to make it work correctly.
In addition to the good answers above, PHP also has a built-in filter function that is quite useful: filter_var.
To remove HTML characters, use:
$cleanString = filter_var($dirtyString, FILTER_SANITIZE_STRING);
More info:
function.filter-var
filter_sanitize_string
You may want take a look at htmlentities() and html_entity_decode() here
$orig = "I'll \"walk\" the <b>dog</b> now";
$a = htmlentities($orig);
$b = html_entity_decode($a);
echo $a; // I'll "walk" the <b>dog</b> now
echo $b; // I'll "walk" the <b>dog</b> now
This might work well to remove special characters.
$modifiedString = preg_replace("/[^a-zA-Z0-9_.-\s]/", "", $content);
If you want to convert the HTML special characters and not just remove them as well as strip things down and prepare for plain text this was the solution that worked for me...
function htmlToPlainText($str){
$str = str_replace(' ', ' ', $str);
$str = html_entity_decode($str, ENT_QUOTES | ENT_COMPAT , 'UTF-8');
$str = html_entity_decode($str, ENT_HTML5, 'UTF-8');
$str = html_entity_decode($str);
$str = htmlspecialchars_decode($str);
$str = strip_tags($str);
return $str;
}
$string = '<p>this is ( ) a test</p>
<div>Yes this is! & does it get "processed"? </div>'
htmlToPlainText($string);
// "this is ( ) a test. Yes this is! & does it get processed?"`
html_entity_decode w/ ENT_QUOTES | ENT_XML1 converts things like '
htmlspecialchars_decode converts things like &
html_entity_decode converts things like '<
and strip_tags removes any HTML tags left over.
EDIT - Added str_replace(' ', ' ', $str); and several other html_entity_decode() as continued testing has shown a need for them.
A plain vanilla strings way to do it without engaging the preg regex engine:
function remEntities($str) {
if(substr_count($str, '&') && substr_count($str, ';')) {
// Find amper
$amp_pos = strpos($str, '&');
//Find the ;
$semi_pos = strpos($str, ';');
// Only if the ; is after the &
if($semi_pos > $amp_pos) {
//is a HTML entity, try to remove
$tmp = substr($str, 0, $amp_pos);
$tmp = $tmp. substr($str, $semi_pos + 1, strlen($str));
$str = $tmp;
//Has another entity in it?
if(substr_count($str, '&') && substr_count($str, ';'))
$str = remEntities($tmp);
}
}
return $str;
}
What I have done was to use: html_entity_decode, then use strip_tags to removed them.
try this
<?php
$str = "\x8F!!!";
// Outputs an empty string
echo htmlentities($str, ENT_QUOTES, "UTF-8");
// Outputs "!!!"
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
?>
It looks like what you really want is:
function xmlEntities($string) {
$translationTable = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($translationTable as $char => $entity) {
$from[] = $entity;
$to[] = '&#'.ord($char).';';
}
return str_replace($from, $to, $string);
}
It replaces the named-entities with their number-equivalent.
<?php
function strip_only($str, $tags, $stripContent = false) {
$content = '';
if(!is_array($tags)) {
$tags = (strpos($str, '>') !== false
? explode('>', str_replace('<', '', $tags))
: array($tags));
if(end($tags) == '') array_pop($tags);
}
foreach($tags as $tag) {
if ($stripContent)
$content = '(.+</'.$tag.'[^>]*>|)';
$str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str);
}
return $str;
}
$str = '<font color="red">red</font> text';
$tags = 'font';
$a = strip_only($str, $tags); // red text
$b = strip_only($str, $tags, true); // text
?>
The function I used to perform the task, joining the upgrade made by schnaader is:
mysql_real_escape_string(
preg_replace_callback("/&#?[a-z0-9]+;/i", function($m) {
return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");
}, strip_tags($row['cuerpo'])))
This function removes every html tag and html symbol, converted in UTF-8 ready to save in MySQL
You can try htmlspecialchars_decode($string). It works for me.
http://www.w3schools.com/php/func_string_htmlspecialchars_decode.asp
If you are working in WordPress and are like me and simply need to check for an empty field (and there are a copious amount of random html entities in what seems like a blank string) then take a look at:
sanitize_title_with_dashes( string $title, string $raw_title = '', string $context = 'display' )
Link to wordpress function page
For people not working on WordPress, I found this function REALLY useful to create my own sanitizer, take a look at the full code and it's really in depth!
$string = "äáčé";
$convert = Array(
'ä'=>'a',
'Ä'=>'A',
'á'=>'a',
'Á'=>'A',
'à'=>'a',
'À'=>'A',
'ã'=>'a',
'Ã'=>'A',
'â'=>'a',
'Â'=>'A',
'č'=>'c',
'Č'=>'C',
'ć'=>'c',
'Ć'=>'C',
'ď'=>'d',
'Ď'=>'D',
'ě'=>'e',
'Ě'=>'E',
'é'=>'e',
'É'=>'E',
'ë'=>'e',
);
$string = strtr($string , $convert );
echo $string; //aace
What If By "Remove HTML Special Chars" You Meant "Replace Appropriately"?
After all, just look at your example...
& ©
If you're stripping this for an RSS feed, shouldn't you want the equivalents?
" ", &, ©
Or maybe you don't exactly want the equivalents. Maybe you'd want to have just be ignored (to prevent too much space), but then have © actually get replaced. Let's work out a solution that solves anyone's version of this problem...
How to SELECTIVELY-REPLACE HTML Special Chars
The logic is simple: preg_match_all('/(&#[0-9]+;)/' grabs all of the matches, and then we simply build a list of matchables and replaceables, such as str_replace([searchlist], [replacelist], $term). Before we do this, we also need to convert named entities to their numeric counterparts, i.e., " " is unacceptable, but "&#00A0;" is fine. (Thanks to it-alien's solution to this part of the problem.)
Working Demo
In this demo, I replace { with "HTML Entity #123". Of course, you can fine-tune this to any kind of find-replace you want for your case.
Why did I make this? I use it with generating Rich Text Format from UTF8-character-encoded HTML.
See full working demo:
Full Online Working Demo
function FixUTF8($args) {
$output = $args['input'];
$output = convertNamedHTMLEntitiesToNumeric(['input'=>$output]);
preg_match_all('/(&#[0-9]+;)/', $output, $matches, PREG_OFFSET_CAPTURE);
$full_matches = $matches[0];
$found = [];
$search = [];
$replace = [];
for($i = 0; $i < count($full_matches); $i++) {
$match = $full_matches[$i];
$word = $match[0];
if(!$found[$word]) {
$found[$word] = TRUE;
$search[] = $word;
$replacement = str_replace(['&#', ';'], ['HTML Entity #', ''], $word);
$replace[] = $replacement;
}
}
$new_output = str_replace($search, $replace, $output);
return $new_output;
}
function convertNamedHTMLEntitiesToNumeric($args) {
$input = $args['input'];
return preg_replace_callback("/(&[a-zA-Z][a-zA-Z0-9]*;)/",function($m){
$c = html_entity_decode($m[0],ENT_HTML5,"UTF-8");
# return htmlentities($c,ENT_XML1,"UTF-8"); -- see update below
$convmap = array(0x80, 0xffff, 0, 0xffff);
return mb_encode_numericentity($c, $convmap, 'UTF-8');
}, $input);
}
print(FixUTF8(['input'=>"Oggi è un bel giorno"]));
Input:
"Oggi è un bel giorno"
Output:
Oggi HTML Entity #232 un belHTML Entity #160giorno

Categories