I was recently shown a cool way to concatenate an unlimited number of rows into a single cell in SQL, but was left wondering how it worked under the hood. The trick involves using
FOR XML PATH('') to build multiple values and then cast them back into a single string. Starting from a simple query, and getting increasingly more complex, let’s look at how this actually works.
I’ll do everything on SQLFiddle. You can follow along by starting your own fiddle or clicking the results link for each step. Let’s start off by creating a basic table called
Person with two columns and three rows of data:
Create A Table
Start by making a table:
CREATE TABLE Person ( FirstName varchar(20), LastName varchar(20) ); INSERT INTO Person (FirstName, LastName) VALUES ('Kyle', 'Mit'), ('Bob', 'Builder'), ('Jimi', 'Hendrix');
Plain SQL Query
This is just a run of the mill SQL query returning 3 rows of data:
SELECT FirstName, LastName FROM Person
SQL Query as XML Grouped by ‘Person’
Here, the output of the Query is formatted into XML with each row represented as a Root node named ‘Person’ and each column as a child element:
SELECT FirstName, LastName FROM Person FOR XML PATH('Person')
<Person> <FirstName>Kyle</FirstName> <LastName>Mit</LastName> </Person> <Person> <FirstName>Bob</FirstName> <LastName>Builder</LastName> </Person> <Person> <FirstName>Jimi</FirstName> <LastName>Hendrix</LastName> </Person>
XML Query With Custom Named Child Nodes
The name of each child node under person is determined by the final column name (this is important later). When columns are selected, those are used by default, however you can manually give any column a specific name.
Note: The name of the child nodes has changed based on our
SELECT FirstName AS First, LastName AS Last FROM Person FOR XML PATH('Person')
<Person> <First>Kyle</First> <Last>Mit</Last> </Person> <Person> <First>Bob</First> <Last>Builder</Last> </Person> <Person> <First>Jimi</First> <Last>Hendrix</Last> </Person>
XML Query with comma and NAMED child node
In SQL, when you perform any kind of aggregation or selection function to a column, it no longer applies a default name. We’d like to concatenate our fields with a comma, so we’ll add a comma to the
select. As an intermediary step, we’ll explicitly specify a name just to show that the comma itself isn’t modifying the result set:
SELECT ',' + LastName AS Last FROM Person FOR XML PATH('Person')
<Person> <Last>,Mit</Last> </Person> <Person> <Last>,Builder</Last> </Person> <Person> <Last>,Hendrix</Last> </Person>
XML Query with comma and UNNAMED child node
By removing the explicit namimg, SQL isn’t able to guess the column name. Consequently, the data is just stuffed into the Person Element.
From the FOR XML PATH documentation:
Any column without a name will be inlined. For example, computed columns or nested scalar queries that do not specify column alias will generate columns without any name.
NOTE: This is a very important step towards concatenation. We need to somehow get rid of the xml markup around our data and we’ve dropped an entire node. In the next step we’ll find out how to get rid of the second one.
SELECT ',' + LastName FROM Person FOR XML PATH('Person')
<Person>,Mit</Person> <Person>,Builder</Person> <Person>,Hendrix</Person>
XML Query with no root node
The command we’ve been using all along is
FOR XML which has four different modes:
In this case, we’re using PATH, which will wrap the data elements in a parent element named for the table from which it came. Optionally, you can add a string parameter to Path to override the root element name. The last trick:
If you specify a zero-length string, the wrapping element is not produced.
By manually specifying the path as an empty string, all the data elements are shown right next to each other.
Note the file name: this is still returning XML (just poorly formatted XML)
SELECT ',' + LastName FROM Person FOR XML PATH('')
Get result of the XML sub query
By selecting the result of the entire query, we transform the XML into a value:
SELECT ( SELECT ',' + LastName FROM Person FOR XML PATH('') )
Remove the First Comma
To remove the leading comma, we’ll use
STUFF(character_expression, start, length, replaceWith_expression). The following query will start at position 1 and replace the 1st character with
SELECT STUFF(( SELECT ',' + LastName FROM Person FOR XML PATH('') ), 1, 1, '')
Cast into VARCHAR for type safety
Finally, we’ll take the whole query and make sure it’s of type
VARCHAR. Also, for good measure, we’ll give the returned column a name:
SELECT CAST(STUFF(( SELECT ',' + LastName FROM Person FOR XML PATH('') ), 1, 1, '') AS VARCHAR(MAX)) AS LastNames
And that’s how to go from constructing XML to a relatively simple concatenation. There are other ways to do this and other customizations you can add to this query, but this should help you understand a little bit more of the fundamentals underneath it all.