Asked  7 Months ago    Answers:  5   Viewed   31 times

I today ran into a really weird problem in SQL Server (both 2008R2 and 2012). I'm trying to build up a string using concatenation in combination with a select statement.

I have found workarounds, but I would really like to understand what's going on here and why it doesn't give me my expected result. Can someone explain it to me?

http://sqlfiddle.com/#!6/7438a/1

On request, also the code here:

-- base table
create table bla (
    [id] int identity(1,1) primary key,
    [priority] int,
    [msg] nvarchar(max),
    [autofix] bit
)

-- table without primary key on id column
create table bla2 (
    [id] int identity(1,1),
    [priority] int,
    [msg] nvarchar(max),
    [autofix] bit
)

-- table with nvarchar(1000) instead of max
create table bla3 (
    [id] int identity(1,1) primary key,
    [priority] int,
    [msg] nvarchar(1000),
    [autofix] bit
)

-- fill the three tables with the same values
insert into bla ([priority], [msg], [autofix])
values (1, 'A', 0),
       (2, 'B', 0)

insert into bla2 ([priority], [msg], [autofix])
values (1, 'A', 0),
       (2, 'B', 0)

insert into bla3 ([priority], [msg], [autofix])
values (1, 'A', 0),
       (2, 'B', 0)
;
declare @a nvarchar(max) = ''
declare @b nvarchar(max) = ''
declare @c nvarchar(max) = ''
declare @d nvarchar(max) = ''
declare @e nvarchar(max) = ''
declare @f nvarchar(max) = ''

-- I expect this to work and generate 'AB', but it doesn't
select @a = @a + [msg]
    from bla
    where   autofix = 0
    order by [priority] asc

-- this DOES work: convert nvarchar(4000)
select @b = @b + convert(nvarchar(4000),[msg])
    from bla
    where   autofix = 0
    order by [priority] asc

-- this DOES work: without WHERE clause
select @c = @c + [msg]
    from bla
    --where autofix = 0
    order by [priority] asc

-- this DOES work: without the order by
select @d = @d + [msg]
    from bla
    where   autofix = 0
    --order by [priority] asc

-- this DOES work: from bla2, so without the primary key on id
select @e = @e + [msg]
    from bla2
    where   autofix = 0
    order by [priority] asc

-- this DOES work: from bla3, so with msg nvarchar(1000) instead of nvarchar(max)
select @f = @f + [msg]
    from bla3
    where   autofix = 0
    order by [priority] asc

select @a as a, @b as b, @c as c, @d as d, @e as e, @f as f

 Answers

88

The KB article already linked by VanDerNorth does include the line

The correct behavior for an aggregate concatenation query is undefined.

but then goes on to muddy the waters a bit by providing a workaround that does seem to indicate deterministic behavior is possible.

In order to achieve the expected results from an aggregate concatenation query, apply any Transact-SQL function or expression to the columns in the SELECT list rather than in the ORDER BY clause.

Your problematic query does not apply any expressions to columns in the ORDER BY clause.

The 2005 article Ordering guarantees in SQL Server... does state

For backwards compatibility reasons, SQL Server provides support for assignments of type SELECT @p = @p + 1 ... ORDER BY at the top-most scope.

In the plans where the concatenation works as you expected the compute scalar with the expression [Expr1003] = Scalar Operator([@x]+[Expr1004]) appears above the sort.

In the plan where it fails to work the compute scalar appears below the sort. As explained in this connect item from 2006 when the expression @x = @x + [msg] appears below the sort it is evaluated for each row but all the evaluations end up using the pre assignment value of @x. In another similar Connect Item from 2006 the response from Microsoft spoke of "fixing" the issue.

The Microsoft Response on all the later Connect items on this issue (and there are many) state that this is simply not guaranteed

Example 1

we do not make any guarantees on the correctness of concatenation queries (like using variable assignments with data retrieval in a specific order). The query output can change in SQL Server 2008 depending on the plan choice, data in the tables etc. You shouldn't rely on this working consistently even though the syntax allows you to write a SELECT statement that mixes ordered rows retrieval with variable assignment.

Example 2

The behavior you are seeing is by design. Using assignment operations (concatenation in this example) in queries with ORDER BY clause has undefined behavior. This can change from release to release or even within a particular server version due to changes in the query plan. You cannot rely on this behavior even if there are workarounds. See the below KB article for more details:
http://support.microsoft.com/kb/287515 The ONLY guaranteed mechanism are the following:

  1. Use cursor to loop through the rows in specific order and concatenate the values
  2. Use for xml query with ORDER BY to generate the concatenated values
  3. Use CLR aggregate (this will not work with ORDER BY clause)

Example 3

The behavior you are seeing is actually by design. This has to do with SQL being a set-manipulation language. All expressions in the SELECT list (and this includes assignments too) are not guaranteed to be executed exactly once for each output row. In fact, SQL query optimizer tries hard to execute them as few times as possible. This will give expected results when you are computing the value of the variable based on some data in the tables, but when the value that you are assigning depends on the previous value of the same variable, the results may be quite unexpected. If the query optimizer moves the expression to a different place in the query tree, it may get evaluated less times (or just once, as in one of your examples). This is why we don't recommend using the "iteration" type assignments to compute aggregate values. We find that XML-based workarounds ... usually work well for the customers

Example 4

Even without ORDER BY, we do not guarantee that @var = @var + will produce the concatenated value for any statement that affects multiple rows. The right-hand side of the expression can be evaluated either once or multiple times during query execution and the behavior as I said is plan dependent.

Example 5

The variable assignment with SELECT statement is a proprietary syntax (T-SQL only) where the behavior is undefined or plan dependent if multiple rows are produced. If you need to do the string concatenation then use a SQLCLR aggregate or FOR XML query based concatenation or other relational methods.

Tuesday, June 1, 2021
 
revive
answered 7 Months ago
10

I think this takes care of all of the issues I spotted in other answers. No need to test the length of the output or check if the leading character is a comma, no worry about concatenating non-string types, no significant increase in complexity when other columns (e.g. Postal Code) are inevitably added...

DECLARE @x TABLE(Id INT, City VARCHAR(32), Province VARCHAR(32), Country VARCHAR(32));

INSERT @x(Id, City, Province, Country) VALUES
(1,'Vancouver','British Columbia','Canada'),
(2,'New York' , null             , null   ),
(3, null      ,'Adama'           , null   ),
(4, null      , null             ,'France'),
(5,'Winnepeg' ,'Manitoba'        , null   ),
(6, null      ,'Quebec'          ,'Canada'),
(7,'Seattle'  , null             ,'USA'   );

SELECT Id, Location = STUFF(
      COALESCE(', ' + RTRIM(City),     '') 
    + COALESCE(', ' + RTRIM(Province), '') 
    + COALESCE(', ' + RTRIM(Country),  '')
    , 1, 2, '')
  FROM @x;

SQL Server 2012 added a new T-SQL function called CONCAT, but it is not useful here, since you still have to optionally include commas between discovered values, and there is no facility to do that - it just munges values together with no option for a separator. This avoids having to worry about non-string types, but doesn't allow you to handle nulls vs. non-nulls very elegantly.

Tuesday, August 3, 2021
 
NIKHIL
answered 5 Months ago
73

Doesn't make any difference. Use the size appropiate for your data.

For instance SQL Server, if you look at the Anatomy of a Record you'll see that your size translates into record offsets that are dependent on the previous record in the table, null values and other factors, specially with row compression and page compression taken into account. By the time the field is accessed, any resemblance with the original declare size relation, vis-a-vis powers of 2 or powers of 10, is long gone. Also various elements higher on a query execution stack like join operators or sort operators or whatever, also would no benefit from powers of 2 sizes (I have no 'proof' linkes, but is OK if you take my word for it...). Neither does the TDS protocol when marshaling data back to client. And I see little benefit in the client too.

Sunday, August 15, 2021
 
Jeffrey Stilwell
answered 4 Months ago
39

Your subquery cannot return two values. If you just want to concatenate strings, you do not need the xml data type at all. You can do the stuff() and subquery in a single statement:

declare @Rep1Names nvarchar(max) = (
    stuff((select ', [' + report_name + ']' as name
           from (select distinct report_order, report_name
                 from #report
                ) x
           order by report_order
           for xml path('')
          )
         ), 1, 1, '');

declare @Rep2Names nvarchar(max) = (
    stuff(select ', isnull([' + report_name + '], 0) as [' + report_name + ']' as res
           from (select distinct report_order, report_name
                 from #report
                ) x
           order by report_order
           for xml path('')
          )
   ), 1, 1, '');
Thursday, August 19, 2021
 
JackTheKnife
answered 4 Months ago
55

Your title explains it - a 1d array does not have a 2nd axis!

But having said that, on my system as on @Oliver W.s, it does not produce an error

In [655]: np.concatenate((t1,t2),axis=1)
Out[655]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 11, 12, 13, 14, 15, 16, 17, 18,
       19])

This is the result I would have expected from axis=0:

In [656]: np.concatenate((t1,t2),axis=0)
Out[656]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 11, 12, 13, 14, 15, 16, 17, 18,
       19])

It looks like concatenate ignores the axis parameter when the arrays are 1d. I don't know if this is something new in my 1.9 version, or something old.

For more control consider using the vstack and hstack wrappers that expand array dimensions if needed:

In [657]: np.hstack((t1,t2))
Out[657]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 11, 12, 13, 14, 15, 16, 17, 18,
       19])

In [658]: np.vstack((t1,t2))
Out[658]: 
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [11, 12, 13, 14, 15, 16, 17, 18, 19]])
Saturday, September 25, 2021
 
Michael Romrell
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share