Question:
I need to load a single large XML file (40GB+) into a SQL Server 2012 table using SSIS. I'm having problems because SSIS seems to be trying to load the entire XML file into memory before loading records into the database, rather than reading and loading parts of the file and freeing memory.
At the moment what I have is a task with an XML Source, with the following properties:
- Data access mode: XML file from variable (but could have set to XML File Location)
- Variable name: name of the variable that specifies the name of the XML file
- XSD location: the path to the XML file validation XSD file.
The structure of my XML file is simple. It only has the following 3 hierarchical levels:
- Root element, with file header information (who uploaded the file, date, etc.)
- The next level defines collections of objects.
- The last level has the objects individually, with a fixed set of fields.
I need to insert one record per object at the last level of the XML, replicating the values of the elements above in the hierarchy. That is, I'm turning the XML hierarchy into a flat file.
My question is: How do I load this file with SSIS, without it trying to load the entire file into memory?
Answer:
XML Source
does not read the entire document, but an element of the hierarchy described in the XSD.
Ensure the document contains a single root element as per documentation in XML Source – at MSDN
Test with a smaller sample part of the document.
Here's a step-by-step where an XML document containing all Wikipedia articles EN was imported into a SQL Server database using Integration Services. This is a 20GB bank at the time.