Moving html data from mysql (wordpress) to sqlserver - php

I am upgrading a website to asp.net which have at least 100K posts in wordpress. I could not find any related topic for moving so i wanted to share my experience.
Big data is not a problem, however, some of wordpress tables have html data containing quotes (both single and double), &nbsp's, tab characters and so on. I have tried many ways for both exporting however, exporting to SQL file will not work for me (at least, i could not able to work with it, it causes so many troubles).

Best way to move data is exporting to CSV files, seperately for each table. While exporting, you have to:
Custom Export
Different and unique strings for both "Columns separated with" and "Columns enclosed with" (I used ############ and #######, respectively)
Check "Remove carriage return/line feed characters within columns"
Check "Put columns names in the first row"
Download export file
After downloading, replacing will take place:
Tabs with spaces \t -> space
Double quotes to single " -> '
###### -> \t
####### -> "
Finally, encoding conversion is required. myadmin output is ANSI file and it is corrupt. Sql server might not able to handle it. To resolve it, first convert to UTF-8 and convert to ANSI again (In notepad++, it has options in "Encoding" menu).
While importing to SQL server, you must have to select text file as source. Select related csv files and while importing, take care of column lengths. Sql server will make all your columns varchar(50) default. Exported data will have much more larger columns. You have to adjust them in import wizard manually. Use DT_Text (not DT_NTEXT) for string values.
I know this process will result in some data loss (tabs and double quotes) however, it is wordpress' html editor's fault. Html data should be stored as encoded in database for these purposes...

go with a linked server created on the target system; this way the import process can be driven by SQL-Server and will hopefully produce a result ready to use without requiring too many steps & checks.
there are many SO posts about interacting with MySQL from SQL-Server:
Can't create linked server - sql server and mysql
SELECT * FROM Linked MySQL server
Do I have to use OpenQuery to query a MySQL Linked Server from SQL Server?

Related

How to change the encoding of a file created by SQL Server Agent job to UTF-8?

I'm trying to create a daily job to query items inventory of my SAP db (SQL Server 17), save the results in a .txt file, and ftp it to my website server, where I'll parse it using PHP, and use it to update stock level on the MySQL db of the website.
I created a job on sql server agent, that will run the following query:
SELECT
Cast (ItemCode as varchar(20)) + '##'+
Cast (OnHand as varchar(20))+ '##' AS Item_Stock
FROM OITW
WHERE OITW.WhsCode='01' AND OnHand>0
The result is outputted to a .txt file like so:
All of the above is working well.
The problem is that the file isn't saved as UTF-8, so after it is uploaded to the server (via ftp), any type of PHP parsing (explode, substr etc) fails.
So what I'm looking for is a way to force the txt file to be saved as UTF-8, or a way to make PHP read the file and be able to parse it as string.
I should add that both fields: OnHand and ItemCode, are numeric. OnHand is inventory field, so obviously only numbers. ItemCode may contain some non-ASCII characters, but if there are such items they are irrelevant anyway; I'm saying it to emphasize there's no fear of data loss in converting encoding, since numbers are numbers in any encoding (I think...).
Any help would be highly appreciated.
You can't, is the simple answer.
The more verbose answer would be to add a further step in your job that runs a Powershell script that does. Something like this should work:
$Path = "C:\{the rest of the file path in your image}\office_stock.txt"
$Content = Get-Content -Path $Path
Set-Content -Path $Path -Value $Content -Encoding UTF8

Correct way to handle CSV Files on PHP

Hi I have the following brain braking thing going on. The thig is that I'm developing a Laravel Application that imports and exports CSV files. Now, the data that the application Imports/Exports(I/E now on) has fields from various data types, we have text and numbers, now the text can contain commas(,) and using the default CSV separator (,) on php can lead to fields on the import to generate incorrectly. The client suggested that I I/E using ^ as a separator for the export and (,) again for the import of the data. Now, my question is, can I trust when I/E data using the default separator? Can anyone suggest a best way to do the I/E process?
Edit
The client main struggle is because he uses Excel on a Mac to edit the CSV files, now on my Mac, I can easily edit the files without any issues regarding the separator, of course if the separator is a comma (,) but if we use the ^ as a separator then my excel is a mess and he's ommit some fields.
Thanks in advance.
Don't re-invent the wheel. Re-use a well-written well-tested package. On good one is CSV from The PHP League.
(Historical note about delimiters: the most overlooked (for 50+ years) feature in computing is that the ASCII charset (and therefore UTF8 too) assigned specific chars for delimiting fields (or units, as they called them) and records ... and even groups of records and entire files. See https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text. But instead folks didnt RTM and so used commas, etc to separate fields and newlines (\r, \n, \r\n) to separate records. D-oh!!! So, if you are able to select your own delimiters and want to be safe by using a char not used for any other purpose, use the ASCII delimiters.)
There is no such thing as a "CSV standard". Therefore, having a "default" comma is not exactly true. One can basically use whatever one likes, and the column and line seperaters as well as the enclosures for values or complete lines really depend on what you are planning to put in as data.
TL;DR: It is totally up to you and your client, what you are using as those characters.

mysql + php encoding

I know there were plenty of questions like this but I am creating the new one because to my point of view it is specific to each situation.
So, my page is displayed in UTF-8 format. The data is taken from mySQL that has utf8_unicode_ci collation. The data I am displaying is the string - 1  Bröllops-Festkläder.
There are some unicode characters in here and they should display fine but they do not. On my page these are just a bunch of hieroglyphs.
Now, the interesting situation:
I am using phpMyAdmin to keep track of what is happening in the database. The website has the ability to import CSV documents containing customer data and modify each customer individually. If I import CSV document containing these characters they are written to the database, readable in phpMyAdmin and not readable on my page. If I use my script to modify the customer information and I type those characters from the browser, the it is vice versa - they are readable on the page and they are not readable in phpMyAdmin, so clearly the encoding is different. I spent ages figuring out the right combination and I could not.
UPDATE: Deceze posted a link below that I copy here to make it more noticeable. I am sure this will save hours and days to many people facing similar issues - Handling Unicode Front to Back in a Web App
There're couple of things that got involved here. If your database encoding is fine and html encoding is fine and you still see artefact, it's most likely your db connection is not using same encoding, thus leading to data corruption. If you connect by hand, you can easily enforce utf encoding, by doing query SET NAMES UTF8 as very first thing after you connect() to your database. It is sufficient to do this only once per connection.
EDIT: one important note though - depending on how you put your data to the DB, your database content may require fixing as it can be corrupted if you put it via broken connection. So, if anyone is facing the same issue - once you set all things up, ensure you are checking on fresh data set, or you may still see things incorrectly, even all is now fine.

Find actual value of PHP variable

I am having a real headache with reading in a tab delimited text file and inserting it into a MySQL Database.
The tab delimited text file was generated (I think) from a MS SQL Database, and I have written a simple script to read in the file and insert it into an existing table in my MySQL database.
However, there seems to be some problem with the data in the txt file. When my PHP script parses the file and I output the INSERT statements, the values in each of the fields are longer than they should be. For example, the first field should be a simple two character alphanumeric value. If I echo out the INSERT statements, using Firebug (in Firefox), between each of the characters is a question mark in a black diamond. If I var_dump the values, I get the following:
string(5) "A1"
Now, this clearly shows a two character string, but var_dump tells me it is five characters long!!
If I trim() the value, all I get is the first character (in this case "A").
How can I get at the other characters, even if it is only to remove them? Additionally, this appears to be forcing MySQL to insert the value as a BLOB, not as a varchar as it should.
Simon
UPDATE
If I do:
echo mb_detect_encoding($arr[0]);
I get a result of 'ASCII'. This isn't multibyte, is it??
Sounds like an encoding issue.
Are you running any strings through PHP functions which are not multi byte safe?
You may need to look at multi byte aware functions in PHP.
OK, solved all these issues by opening the TXT file in notepad and saving it specifically as UTF-8.
I still don't know what encoding was used (maybe UNICODE??) but it's all sorted now

MySQL import in phpmyadmin (CSV) chokes on quotes

I am trying to import a .csv file into a MySQL table via phpMyAdmin.
The .csv file is separated by pipes, formated like this:
data|d'ata|d'a"ta|dat"a|
data|"da"ta|data|da't'a|
dat'a|data|da"ta"|da'ta|
The data contains quotes. I have no control over the format in which I recieve the data -- it is generated by a third party.
The problem comes when there is a | followed by a double quote. I always get an "invalid field count in CSV input on line N" error.
I am uploading the file from the import page, using Latin1, CSV, terminated by |, separated by ".
I would like to just change the "enclosed by" character, but I keep getting "Invalid parameter for CSV import: Fields enclosed by". I have tried various characters with no success.
How can I tell MySQL to accept this format in phpMyAdmin?
Setting up these tables is the first step in writing a program that will use uploaded gzipped .csv files to maintain the catalog of an e-commerce site.
I've been having a similar problem for the last several hours and I've finally gotten an import to work so I'll share my solution, even though it may not help the original poster.
Short version:
1.) if an Excel file, save as ODS (open document spreadsheet) format.
1a.) If the file is some kind of text format with delimiters (like the original poster has), then open Excel, and once inside Excel use File/Open to open the file. There you will be able to select the appropriate delimiter to view the file. Make sure the file looks alright, THEN save as ODS format (and close the file).
2.) Open the file in OpenOffice Calc (free download from Oracle/Sun).
2a.) Press Ctrl-F to open the Find dialog box. Click More Options and make sure "Current Selection Only" is NOT checked.
2b.) Search for double quotes. If there are none in your file, you can skip steps 4 and 5.
3.) Save As -> Text CSV. Select options for UTF-8 format (press "u" 3 times to get there fast), select ";" (semi colon) as separator, and select double quotes for text.
4.) If there were any double quotes found in your file in step 2b, continue, otherwise just import the file as CSV with phpMyAdmin (see step 6). It should work.
5a.) Open in Word or any other text editor where you can do Find -> Replace All.
5b.) Find all instances of three double quotes in a row by searching for """ (if you do find any, you might even want to search for 4, 5, 6 etc. in a row until you come up empty).
5c.) Replace the """ with a placeholder that is not found anywhere else in your csv. I replaced them with 'abcdefg'.
5d.) Find -> Replace all instances of "" (two double quotes in a row) with " (forward slash and double quote).
5e.) Find -> Replace all instances of abcdefg (or your chosen placeholder from step 5c) with "". 5c and this step ensure that any quotes occuring at the end of a field just before the text-delimiting quote are properly 'escaped'.
5f.) Finally, save the file, keeping in UTF-8 (or whatever format you need for import).
6.a) In phpMyAdmin, click the "import" tab, click the "choose file" button, and select the file you just saved.
6b.) under 'Format of imported file' CSV should be selected. If column names are in the first row, make sure that checkbox is checked. Most importantly, 'Fields terminated by' should be set to ; (semi colon), 'Fields enclosed by' should be set to " (double quotes), and 'Fields escaped by' should be set to \ (forward slash). You set that up in your file by following step 3, and if necessary by following steps 5a - 5f.
7.) Click "Go" and pray you didn't just waste another hour.
Now that the short version has turned out this long, I'll skip the long version.
Suffice it to say, there seem to be 2 major problems with importing through phpmyadmin.
1.) There's some kind of memory problem that prevents large Excel and ODS files (how large is large? not sure yet) being imported.
2.) Neither OpenOffice nor Excel seem to save their csv files in a way that's compatible with phpmyadmin. They want to escape double quotes with double quotes. phpMyAdmin wants double quotes escaped with something else, like forward slash.
The first problem will hopefully be fixed in an update of phpmyadmin (and/or the Excel importing add-on 'PHPExcel').
The second one could be fixed if there was an easy way to change the escape character for Excel or ODS files saved as CSV, or if phpMyAdmin could be made compatible with their format (that should actually be pretty easy. Simply have it perform the same find-replace actions we performed manually above to skirt the double quote problem).
I hope this helps somebody, as I spent 3-4 hours discovering this solution and another hour writing it here. I hope it's not too long, but I was hoping to help people at all levels of expertise from zero to wherever I am (probably around 0.1).
I found a hack that works -- I use the $ as the "enclosed by" character and all is well. Since this is for a European site, I know that they'll never use it in the table content.
you could modify the csv files by adding a \ in front of every ' right?
Have you tried blanking the boxes that read "Fields enclosed by" and "Fields escaped by"? I have not used phpMyAdmin, but Google suggests others have had success with this method.
You might consider just writing your own LOAD DATA INFILE query, seems like you'll need one anyway since this process will be part of an application at some point.

Categories