Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to convert kruti dev (Indian language) to unicode. There is this site - http://rajbhasha.net/drupal514/UniKrutidev+Converter which converts krutidev to unicode but this is done in javascript. I want to do it in php, can someone help.
What you have to do is not an encoding conversion but a custom character mapping.
In an encoding a specific byte or byte sequence stands for a specific character. The font then visualizes this character. For example, in ASCII the byte x41 stands for the character "A", and different fonts have different shapes to display this "A" visibly on screen.
In the case of Kruti Dev, apparently at the time it came into being, there was no encoding for Indian languages; i.e. there was no particular byte specified that should represent "व" in any system in use at the time. What the creators of Kruti Dev did was simply redefine the shape of a letter. The bytes still said the letter was "A", the operating system still thought it was handling the letter "A", but the font contained the shape of "व" for visual display.
So there's no encoding conversion you can do here, since the underlying encoding is being abused in non-standard ways to begin with. What you need to do is to map latin letters to actually specified Indian letters. You need to substitute every "A" for "व" (just an example, no idea about the actual mapping).
Check out iconv
$str = iconv($srcencoding, $destencoding, $str);
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Dealing with Unicode and UTF-8 has been a nightmare in PHP for years, but I've always hoped things would get better with PHP 8. Have they?
What considerations must a developer using PHP 8 make with regards to receiving, processing, storing, and returning content in UTF-8?
I know about UTF-8 all the way through, but how much of that advice still applies in PHP 8? Are there new and better ways to handle Unicode in PHP? Are standard string functions now UTF-8 safe?
A developer needs to know what UTF-8 really is, before it is used in a program.
Only then you'll understand why things can not get any better in any programming language without having to rewrite the language completely. So UTF-8 has nothing to do with PHP specifically.
UTF-8 has the advantage, compared to other Unicode encoding schemes, of being backward compatible with US-ASCII and being self synchronising. Because of the backward compatibility, many of PHP's SBCS functions will also work with MBCS strings.
To answer the question, not much has changed with PHP 8, other than that the default encoding is UTF-8 since PHP 5.6. The article still applies. You'll need an extention to be able to work with multibyte encodings, and you'll need to take into account the encoding scheme of different data sources like the database you're using, which is common for all programming languages.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have a field that users can enter whatever hey want, And I would allow them for decoration using special characters. but Now I really face with a big problem!
Special characters are like this: ♥♦☻NAME☻♦♥
And my really problem is 'alt+255' characters. it's like space and there are so many special characters like space. by the way My links are disabled and no one could select it.
There is a mandatory to enter more than only 1 character,
I want to know how to prevent this problem. my exact mean How can I let users enter special characters but still my links are clickable
If you are including the text in URLs then you really have two options. The most common approach is to strip out everything except for letters, numbers, dashes, and underscores (i.e. don't allow any special characters at all). You could use a simple regular expression replacement to do that.
Alternatively, you could allow all special characters, but escape them for use in links. You will find PHP's urlencode() and urldecode() useful for that.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Working with a ficticuous string such as;
$string = 'Ford : LTD 1988 Ford Station Wagon with HP 351 H Engine and Performance Transmission';
How could I use rexexp or preg_match (I don't know which would be better either) to extract a sequence of letters and numbers ("HP 351 H") from that string to use in another variable (ie: $EngineSize)
The above is a fictious example, I'm just trying to make it clear that I'm trying to extract letters and numbers from a UI.
NOTE: Being that this is coming from a UI, the engine size may be positioned anywhere in the string and the format may be with or without spaces and may or may not have a letter at the end, as well as the model could be 2 or 3 letters (ie; LE, LT, LTD etc) as well as the engine size could be 2 - 3 digits possibly followed by 2 or three letters).
If anyone wouldn't mind showing me how to write an expression to retrieve this data and explain to me which is better (regexp or preg_match) I'd be most appreciative and I thank you in advance.
The following regex matches exactly what you describe, but there is a good chance of false positives:
/(?<=\s|^)[a-zA-Z]{2,3} ?\d\d\d? ?[a-zA-Z]{1,3}?(?=\s|$)/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 8 years ago.
Improve this question
I'm outputting database information that contains code such as <b> or <u> or <i> but the text isn't styling. I'm using nlbr() to format it correctly and htmlspecialchars() when inputting textarea text into a database if that helps.
What am I missing?
If the input string passed to this function and the final document share the same character set, this function is sufficient to prepare input for inclusion in most contexts of an HTML document. If, however, the input can represent characters that are not coded in the final document character set and you wish to retain those characters (as numeric or named entities), both this function and htmlentities() (which only encodes substrings that have named entity equivalents) may be insufficient. You may have to use mb_encode_numericentity() instead.
htmlspecialchars — Convert special characters to HTML entities
htmlentities — Convert all applicable characters to HTML entities
http://php.net/manual/en/function.htmlspecialchars.php
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Basically I need to match something like this:
0000-000 Text with spaces
Where 0000-000 and 0 is any number, followed by a space followed by arbitrary text, with spaces.
I have the numbers down:
/^\d{4}(-\d{3})?$/
but I'm having a hard time getting the text...
It's close, but you would use this pattern to match the text as well:
/^\d{4}(-\d{3})? ([\w\s]+)$/
From the documentation:
\d any decimal digit
\s any whitespace character
\w any "word" character
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
Try this regex
/^\d{4}(-\d{3})? .+$/
For people who DON'T assume everyone just uses the standard U.S. English charset:
/^\d{4}(-\d{3})? ([\p{L}\s]+)$/u
\p{L} matches any Unicode codepoint that is classified as a letter, regardless of language. The u flag is required at the end so that PHP's PCRE engine expects Unicode.
If you want to match only text and spaces after the numbers, you can do:
/^\d{4}(-\d{3})?[ a-zA-Z]+$/
Here's an interactive regex editor (made for Ruby but works for php)
http://rubular.com/r/ocbo5Sea8m
[0-9]{4}-[0-9]{3} .+
Seems to work