method is the most efficient when translating bunches of text/web pages including HTML? I want to translate the text, but keep the HTML.
Also, should I keep the words in a database or an array?
When you say "translating", do you mean from one language to another? If so, you can use regular expressions to capture the data between open and closing tags of your HTML without losing the markup. I'm not sure however why you would want to store your data in a database, unless you were going to retrieve it at a later point?
If this is for a translation on the fly, it will always be faster to store your data in memory -- your Array or simply update the HTML while you loop through the data and eliminate the need for an Array altogether.
Related
What is the most secure way to save data from a textarea that contains a <pre><code> text in it? , using strip_tags will remove all the tags from the text..
is it save to use this:
strip_tags($input, '<pre><code><other accepted tags except script,php,...');
or should I do other things too?
What is the most secure way to save data from a textarea that contains a <pre><code> text in it?
Save it as it is.
When you take that data back out of the database and put it into a web page, call htmlspecialchars on it first to escape it so that it looks like normal text on the page.
If you want the user to be able to input actual markup, but you only want to allow certain tags, then you've got a different problem and you want something like htmlpurifier.
Either way, the input or database layer is not the right place to be worrying about output formatting concerns.
If you are saving the contents of the text area to mysql database you should use mysqli_escape_string. before saving the data.
Also you can remove javascript tags from the posted data using regular expression. e.g preg_replace
I have a html table, generated by another website that I'm trying to convert to a php array.
I can not convert it using simplexml because the code of the generated table is not valid, and cause a lot of errors, also I need to keep some attributes of the table td elements, and remove the others.
What would be the most efficient way of doing this? Or do you know any php class that could help me achieve this?
BTW: What I'm trying to do is convert an school schedule to a php array, that I will be able to exploit after.
Here is an example of the data I retrieve: http://paste2.org/p/1869193
Btw, using php strip tags, I already remove the unnecessary tags such as spans and fonts.
You can also use PHP's Tidy if installed (it is by default on some installs) - it not only cleans up the HTML, but also lets you traverse the DOM:
http://www.php.net/manual/en/book.tidy.php
You can find a list of HTML parserd in the answers of the following question on SO:
Robust and Mature HTML Parser for PHP
I need to pull a section of text from an HTML page that is not on my local site, and then have it parsed as a string. Specifically, the last column from this page. I assume I would have to copy the source of the page to a variable and then setup a regex search to navigate to that table row. Is that the most efficient way of doing it? What PHP functions would that entail?
Scrape the page HTML with file_get_contents() (needs ini value allow_url_fopen to be true) or a system function like curl or wget
Run a Regular Expression to match the desired part. You could just match any <td>s in this case, as these values are the first occurrences of table cells, e.g. preg_match("/<td.*?>(.*?)<\/td>/si",$html,$matches); (not tested)
If you can use URL fopen, then a simple file_get_contents('http://somesite.com/somepage') would suffice. There are various libraries out there to do web scraping, which is the name for what you're trying to do. They might be more flexible than a bunch of regular expressions (regexes are known for having a tough time parsing complicated HTML/XML).
I'm creating my own blog in PHP and want to know your opinions on how I should format my post content.
Currently I store the post content as just plain text, call it when necessary, then wrap each line with P tags. I did this in case I wanted to change the way I formatted my text in the future and it would save me the dilema of having to remove all P tags from the posts in the DB.
Now the problem I have this this method is that if I want to add extra formatting in, e.g. lists etc those would also be wrapped with P tags which is not correct.
How would you do this, would you store text as plain text in the DB, or would you add the HTML formatting and store that in the DB to?
I'd prefer not to store unnessary HTML in the DB, but not sure of a way around it?
I think the best way would be to keep the html in the db. You would have too much to work with parsing the text if you don't use html.
See how it's done in other blog tools. I know that Joomla, for example, keeps all html in the db. I know Joomla isn't blog tool :) but still...
Wordpress stores html in the db. You say you are concerned about storing 'unnecessary' html in the db. What makes it unnecessary? I think it is the opposite. You may have headings or bold or italic text in your post. If storing as plain text, how do you save this formatting? How are you saving the lists you mentioned?
I see it as a better practice to store raw user input in the database, and format it on output, caching the result if it is needed. That way you can change the way you are parsing things easily without having to regex-replace anything inside the database. You can also store the raw input in one column, and the formatted HTML in another one.
I assume that you are formatting your raw text with the Markdown or the Textile syntax?
If you store HTML in your DB, you will be just a few clicks away from your current situation:
you can use strip_tags() to remove HTML formating and in case of bigger changes, you can run HTML Tidy on your code to remap tags and classes.
I am recently working in a project. There I need to convert language from English to Japanese by button click event. The text is in a div. Like this:
"<div id="sampletext"> here is the text </div>"
"<div id="normaltext"> here is the text </div>"
The text is come from database. How can I convert this text easily?
Assuming that you have both the English and the Japanese version in the database, you can do two things:
Use AJAX to load the correct text from the database and replace the contents of the div. There are tons and tons of tutorials on the internet about AJAX content replacement.
Put both languages on the website and hide one using CSS display:none. Then use some JavaScript to hide/display the correct div when a button is clicked.
The first is technically more complex but keeps your page size small. The second one is very easy to do, but your page size is larger because you need to send both languages.
If the div is small and there is only one or two of these on the page, I recommend number two, the CSS technique. If the div is large (i.e. a complete article) or there are many of them then use the first method.
If you mean translating the text, you cannot do it easily. To get some idea of the best attempts that software can make at translating natural languages, go to Google Translate or Babelfish. It's not that good, but it's sometimes an intelligible starting point.
If you just mean setting the language attribute on an element, then assign a new language code to the lang property of the div element object.
document.getElementById("normaltext").lang = "en-US";
I don't know the language code for Japanese; possibly ja-ja.
Assuming your literals have an id in your database you could put that id as a class in your div. Then with jquery fetch the ID, send it to your Ajax back-end and fetch the translated one.
First, if you have the texts in a database it really doesn't matter if you render it in divs, tables or whatever.
First you need a php api for some translation service. Here is just an example that might give you some ideas.
$textArray = getTextForThisPage();
?>
...
english_to_japanese($textArray["text1"]);?>
...