Monday, March 7, 2011

Querying XML using XQuery or XPath in SQL Server

You will have realized that XML is stored differently from traditional relational
data. Not surprisingly, the way to query XML isn't quite the same as
querying relational data. However, because you're retrieving data from a SQL
Server column, you use T-SQL code as the framework.
 
Understanding XQuery
The language used to retrieve data in columns that are of the xml datatype is
called the XML Query Language (XQuery). Microsoft has chosen to use a
subset of the XQuery specification and is aware that the specification is,
at the time of writing, not yet finalized at the W3C. The hope is that the
subset of XQuery that Microsoft used doesn't change between the draft that
Microsoft used in SQL Server 2005 and the final release of XQuery but that is
not guaranteed.
 
XQuery is based on an earlier XML specification called the XML Path
Language (XPath). If you have worked with XML, you may have worked with
XPath 1.0. XQuery is based on XPath 2.0.
 
Among the XQuery syntax you can use to retrieve XML data are
 
 XPath expressions
 FLWOR expressions
 
A FLWOR expression has the following components:
 FOR
 LET
 WHERE
 ORDER BY
 RETURN
 
The LET keyword is not supported in SQL Server 2005.
 
Both XPath expressions and FLWOR expressions in SQL Server 2005 use the
query keyword. Broadly the syntax is as follows:
 
SELECT ColumnList
FROM ColumnName.query('XPathOrXQueryExpression')
 
To demonstrate how to use XPath expressions, I retrieved some data from
the PurchaseOrders table from the earlier example. This is untyped XML
but the same syntax is used for typed XML.
 
To retrieve selected parts of the PurchaseOrder column, follow these steps:
 
1. Open SQL Server Management Studio and click the Database Engine
Query button to create a new query.
2. Ensure that you're using the XMLREAD database by using this code:
USE Chapter7
3. Confirm that you can retrieve data from the PurchaseOrders table
by using the following code:
SELECT * FROM PurchaseOrders
4. Retrieve the PurchaseOrder element and its children using the following
code:
SELECT PurchaseOrder.query('/PurchaseOrder')
AS Result
FROM dbo.PurchaseOrders
 
5. Retrieve the Date element and its content by using the following code:
SELECT PurchaseOrder.query('/PurchaseOrder/Date')
AS Result
FROM dbo.PurchaseOrders
The Date element and its content displays in the results grid.
6. Retrieve the value contained in the Date element using the following
code:
SELECT
PurchaseOrder.query('/PurchaseOrder/Date/text(
)')
AS Result
FROM dbo.PurchaseOrders
The text() function is an XPath function that retrieves the value contained
in an XML element.
7. To test the use of XPath expressions in combination with the WHERE
clause, add a new purchase order to the table by using the following
code:
INSERT INTO PurchaseOrders VALUES(2, '<?xml
version="1.0"?>
<PurchaseOrder>
<Date>2006/01/01</Date>
<From>Some Other Fictional Company</From>
<Contact>John Jones</Contact>
<ContactPhone>234-567-8901</ContactPhone>
<LineItems>
<LineItem Quantity="3">Some article</LineItem>
<LineItem Quantity="12">Some other
article</LineItem>
<LineItem Quantity="300">Yet another kind of
article</LineItem>
</LineItems>
</PurchaseOrder>')
8. Confirm that the row has been successfully added using the following
code:
SELECT * FROM PurchaseOrders
Notice that two rows now display in the results grid.
9. To retrieve the date of the purchase order that has a value of 2 for the
PurchaseID, use the following code:
SELECT PurchaseID,
PurchaseOrder.query('/PurchaseOrder/Date')
AS Result
FROM dbo.PurchaseOrders
WHERE PurchaseID = 2
 
Creating indexes for the xml datatype
When you query for XML data as I did in the preceding examples, every row
of the PurchaseOrders table is queried in turn. In production size databases,
this way is very inefficient and is likely to produce poor performance.
 
To improve performance, you can create one or more XML indexes.
There are two types of XML index:
 Primary: A primary XML index must exist before you can create a secondary
XML index.
 Secondary: There are three types: PATH, VALUE, and PROPERTY.
 
A PATH secondary index can give improved performance when using XPath
expressions, such as those demonstrated in the preceding section. A VALUE
index can give improved performance when using value comparisons, for
example in XPath predicates.
 
Using the XML Data Modification Language
XQuery, which I briefly introduce in the previous section, is a W3C standard
in development. Version 1.0 of XQuery doesn't include any syntax to support
inserting, deleting, or updating of XML data. SQL Server 2005 contains a
proprietary extension to XQuery that provides this data modification functionality.
The XQuery Working Group at W3C are likely to adopt a syntax
very similar to the one that Microsoft has added to SQL Server 2005, but it
remains possible that there may be significant differences. As far as the initial
release of SQL Server 2005 is concerned, three new keywords allow you to
modify data:
 insert
 delete
 change value of
 
Each of the preceding keywords is case sensitive, so it's important that you
get the case correct to avoid errors when running your code.
To insert XML, you use the insert keyword. You can specify where to insert
the new piece of XML with the following keywords:
 after: Inserts after the specified point in the existing XML structure.
 before: Inserts before the specified point in the existing XML structure.
 into: Inserts into a position specified by using the as first or as
last keywords.
 
In the following example, I create a variable that is of xml datatype, set its value
to a Contact element, and then modify it to add a ContactName element. To
run the example, follow these steps. I explain the code after the steps.
1. Open the SQL Server Management Studio and click the Database
Engine Query button to create a new database engine query.
2. Specify to use the XMREAD database with the following code:
USE XMREAD
3. Enter the following code, highlight it all, and then press F5 to run it.
It is important that you run all the code in a single batch or you get
errors about undeclared variables.
DECLARE @insertXML xml
SET @insertXML = '<Contact>This contact was
inserted.</Contact>'
SELECT @insertXML
SET @insertXML.modify('
insert <ContactName>Jimmy Case</ContactName>
after
/Contact[1]
')
SELECT @insertXML
 
In the first line you declare an insertXML variable. The second line sets the
value of the insertXML variable. The SELECT statement in the third line of
code causes the value of the insertXML variable to display in the results
grid. The next SET statement uses the modify keyword (it must be all lowercase)
to modify the value of the insertXML variable. A ContactName element is
added. Notice the after keyword that specifies that the new element is
added after the Contact element.
 
The final SELECT statement retrieves the value of the insertXML variable
after it has been modified.
 
Converting Data to and from XML
The earlier parts of this chapter describe using the xml datatype. The SQL
Server 2000 functionality to retrieve relational data using the FOR XML statement
has been extended in SQL Server 2005. Also, the OPENXML keyword
functionality is used to shred XML into relational data, if that fits with your
storage needs.
 
Using the FOR XML statement
A T-SQL SELECT statement retrieves relational data as a rowset. The FOR
XML clause of a SELECT statement lets you optionally convert a retrieved
rowset as XML.
 
In SQL Server 2005, the FOR XML clause has the following options:
 AUTO: The AUTO option automatically creates an XML hierarchy that, for
the most part, is outside your control. You can use nested FOR XML
queries to take some control of the XML structure. The document element
is created automatically using the table name.
 
 EXPLICIT: The EXPLICIT option gives you enormous control over the
structure of returned XML, but at a cost of very complex syntax that's
not easy to understand nor to modify or debug.
 
 PATH: The PATH option is new to SQL Server 2005 and gives the full control
that the EXPLICIT option gives but with less complex syntax.
 
 RAW: The RAW option creates a single row element for each row returned
in the rowset. To create a hierarchy of XML elements, you can nest FOR
XML queries.
 
To retrieve some relational data from the AdventureWorks database and
display it by using the AUTO option, follow these steps:
1. Open SQL Server Management Studio and click the Database Engine
Query button to create a new database engine query.
 
2. Type the following code (also available as ForXML.sql in the code
download):
USE AdventureWorks
SELECT *
FROM Person.Contact
WHERE ContactID = 1
FOR XML AUTO
 
The first line of the code specifies to use the AdventureWorks database.
The SELECT statement retrieves all columns from the
Person.Contact table. The WHERE clause specifies to filter the data so
that only data where ContactID = 1 displays. The FOR XML AUTO
specifies to transform the filtered data into XML.
 
3. Click the data that returns and you see an appearance similar to
 
Using the OPENXML keyword
The OPENXML keyword takes an XML document and shreds it into rowsets
that can be stored in relational columns in SQL Server.
The sp_xml_preparedocument stored procedure is used to parse the XML
document. Then you specify which parts of the XML document are to constitute
rows in SQL Server.
 

1 comment :

  1. If you're looking for a good contextual ad company, I recommend you have a look at Clicksor.

    ReplyDelete