How to split contents of a SQL table column stored as CSV (comma seperated values) into individual rows

I have these CSV files with store inventories which I’d like to import into a MS SQL Server database and use SQL to process them. The table will have three columns that contain CSV data. The problem is that I need to get this CSV data into individual rows in order to analyze it more. I’d like to end up with two tables in the end: the original table created by the import of the CSV file, and a table created by splitting out the CSV. Here is a representation of what the two tables would look like (in CSV format since I can’t upload a screen shot):

Table 1 (the original CSV file).  First row is column names:
StoreID,Date,StoreName,City,State,Category1CSV,Category2CSV,Category3CSV
1051,2/16/2014,Easton,Columbus,OH,"Flour,Yeast,Baking Powder","Milk,Water,Oil","Cinnamon,Sugar"
1425,1/14/2014,Crocker Park,Westlake,OH,"Baking Powder,Yeast,Four","Oil,Milk,Water","Rosemay,Cinnamon,Sugar"

Table2 (after splitting the CSV column contents). First row is column names:
StoreID,Date,StoreName,City,State,ItemName,ItemRank,ItemCategory
1051,2/16/2014,Easton,Columbus,OH,Flour,1,1
1051,2/16/2014,Easton,Columbus,OH,Yeast,2,1
1051,2/16/2014,Easton,Columbus,OH,Baking Powder,3,1
1051,2/16/2014,Easton,Columbus,OH,Milk,4,2
1051,2/16/2014,Easton,Columbus,OH,Water,5,2
1051,2/16/2014,Easton,Columbus,OH,Oil,6,2
1051,2/16/2014,Easton,Columbus,OH,Cinnamon,7,3
1051,2/16/2014,Easton,Columbus,OH,Sugar,8,3
1425,1/14/2014,Crocker Park,Westlake,OH,Baking Powder,1,1
1425,1/14/2014,Crocker Park,Westlake,OH,Yeast,2,1
1425,1/14/2014,Crocker Park,Westlake,OH,Flour,3,1
1425,1/14/2014,Crocker Park,Westlake,OH,Oil,4,2
1425,1/14/2014,Crocker Park,Westlake,OH,Milk,5,2
1425,1/14/2014,Crocker Park,Westlake,OH,Water,6,2
1425,1/14/2014,Crocker Park,Westlake,OH,Rosemary,7,3
1425,1/14/2014,Crocker Park,Westlake,OH,Cinnamon,8,3
1425,1/14/2014,Crocker Park,Westlake,OH,Sugar,9,3

The SQL column data types are:

Table 1
StoreID - int
Date - date
StoreName - nvarchar(50)
City- nvarchar(50)
State- nvarchar(50)
Category1CSV - nvarchar(MAX)
Category2CSV - nvarchar(MAX)
Category3CSV - nvarchar(MAX)



Table2
StoreID - int
Date - date
StoreName - nvarchar(50)
City- nvarchar(50)
State - nvarchar(50)
ItemName - nvarchar(50)
ItemRank - tinyint
ItemCategory -tinyint

The Table 1 columns labeled Category1CSV, Category2CSV, and Category3CSV contents map to Table 2 columns: ItemName, ItemRank, ItemCategory, where ItemName is the Item (example: Flour), ItemRank is the order of the item in the CSV list, and ItemCategory is either 1,2 or 3, depending on whether the data came from Category1CSV, Category2CSV or Category3CSV.

The most important aspect of this (other than splitting out the CSV column) is to maintain the order of items within the CSV columns. for example, StroreID 1051 has Category1CSV contents of “Flour,Yeast,Baking Powder”. Those will map to the columns ItemName, ItemRank, and ItemCategory such that ItemName = Flour, it’s ItemRank = 1, and the ItemCategory = 1. This would be the first row in Table 2. The second row would be ItemName = Yeast, it’s ItemRank = 2, and the ItemCategory = 1, and so on until you end up with what looks like Table 2 above. Also, you’ll notice that the ItemRank numbering starts with the contents of the column Category1CSV, then continues to Category2CSV and finally Category3CSV.

After that lengthy explanation, is it possible to have some SQL statement that will create Table 2 from Table 1 for me? If so, what would that look like? I’m planning to use MS SQL Server Express 2012.

Thank you!

What can I say? I would prepare the data using Excel first and import it into the database.

That’s exactly how I’m doing it… but it is very tedious, time consuming and repetitive… Sounds like a perfect task to be automated :slight_smile: I’d be open to some VBA that would accomplish that task in Excel. Thank you!

1 Like

If your familiar with programming you could write a small program of your own in your favorite language that would:

  1. Read the CSV files or table1
  2. Format data if needed
  3. Insert into database table2

Hey TeNDoLLA, you’re right, and I’m leaning towards Python to do this. I started to learn Python last year but had to drop it because of the need to learn other things for my job. I’m doing a Python crash course right now… It seems that one of Python’s strengths is parsing text, and working with CSV files. If you have any idea how to do that I’d welcome your input!
Thanks!

I don’t think that you can do that with SQL only. Yes with PL/SQL or, if you’re using Excel, that macro would be a piece of cake.

I would assume that Python would be easy too because I know that one of its strong points are regular expressions but since my knowledge of Python is Nill, zero, nada… can’t say anything

@molona… I posted the same question over in the Python group. If you could help me with an Excel macro, I’d be open to that as well… I don’t have any requirement to solve this problem in any particular way. Thanks!

I’ll try to see if I have the time to do it today. It is just loop after loop, basically.

1 Like

@molona, Thank you! Just to share what I have thus far in terms of an Excel macro (which was adapted from a post on stackoverflow: (cells A1, B1 and C1 contain the CSV data)

Sub splitSets()
Dim X As Variant
X = Split(Range("A1").Value, ",")
Range("A1").Resize(UBound(X) - LBound(X) + 1).Value = Application.Transpose(X)
On Error Resume Next

Dim Y As Variant
Y = Split(Range("B1").Value, ",")
Range("B1").Resize(UBound(Y) - LBound(Y) + 1).Value = Application.Transpose(Y)
On Error Resume Next

Dim Z As Variant
Z = Split(Range("C1").Value, ",")
Range("C1").Resize(UBound(Z) - LBound(Z) + 1).Value = Application.Transpose(Z)

On Error Resume Next
Exit Sub

This will split the contents of those columns, but the problem is that it just splits them in place in cells A1, B1, C1 which just overwrites the contents of the cells in the rows below them. So to get around that I am copying each row to a new sheet, performing the split with this macro, then selecting and moving the data from Column B and C to the last row in Column A, and using Excel’s fill functions to fill out the rest of the table, then selecting all the data and pasting it into a new Excel file to build the desired result.

you can, but the effort is not worth the time

I’d like to see an example. I don’t see a problem splitting the values… but I do see it to create more than one row from the same record

[quote=“molona, post:11, topic:115287”]
I’d like to see an example. [/quote]

sorry, it’s too much work!

do you have a copy of SQL For Smarties by Joe Celko? i think i would start there… you have to join each row to a numbers table, to get the 1st, 2nd, 3rd… csv value… and then of course you’d also have to write a function to basically do the same as mysql’s SUBSTRING_INDEX function

life’s too short, molona…

No but it seems that I will have to get it to see the full solution :smile:

I agree. Life is too short :wink:

@molona I got a hit on this over at stackoverflow. Check out what one person posted in terms of using a Number table and using the split function. stackoverflow.com link

So maybe I don’t have to buy the book after all! :smiley:

Thank you @justinmason72

I did use Excel’s split fucntion although it doesn’t work like in SQL :slight_smile:

i didn’t say the solution was in the book, i said i’d start with the book because i know the solution requires a numbers table and there are a couple good examples of using a numbers table in there

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.