====================================
1.6. Parsing documents from content
====================================
When you parse the contents of an file, or a stream or a reader, Scryber builds a full object model of the content plus any referenced content.
As the parser is based around XML it is important that all content is valid - it does not like unclosed tags or elements.
1.6.1. Content namespaces
--------------------------
As the most basic example
.. code:: html
Hello World
Hello World.
.. code:: csharp
//Scryber.UnitSamples/OverviewSamples.cs
public void SimpleParsing()
{
var path = GetTemplatePath("Overview", "SimpleParsing.html");
using (var doc = Document.ParseDocument(path))
{
using (var stream = GetOutputStream("Overview", "SimpleParsing.pdf"))
{
doc.SaveAsPDF(stream);
}
}
}
Would be parsed into the following Document Object Model
.. figure:: ../images/doc_object_model.png
:target: ../_images/doc_object_model.png
:alt: Parsed Document Object Model
:class: with-shadow
`Full size version <../_images/doc_object_model.png>`_
And the output would be
.. figure:: ../images/samples_overviewSimple.png
:target: ../_images/samples_overviewSimple.png
:alt: Hello world output
:class: with-shadow
`Full size version <../_images/samples_overviewSimple.png>`_
The namespace for an element must be known to the parser. For most XHTML templates this will be the standard XML Namespace (xmlns) http://www.w3.org/1999/xhtml
This namespace is mapped directly onto the library assembly and namespace ``Scryber.Html.Components, Scryber.Components``
In the library there is a class called ``HTMLDocument`` that is decorated with the ``PDFParsableComponent`` attribute with a name of 'html'.
This is how the parser knows that when it sees an XML element called *html* it should create an instance of the ``Scryber.Html.Components.HTMLDocument`` class.
This class has a couple of properties on it for *Head* and *Body* that are decorated with the ``PDFElement`` attribute with names *head* and *body* respectively.
So the parser knows when it reads elements with this name, the values should be set as instances of the classes ``HTMLHead`` and ``HTMLBody``.
It also has an attribute for the *lang* value that will be set.
.. code:: csharp
namespace Scryber.Html.Components
{
[PDFParsableComponent("html")]
public class HTMLDocument : Document
{
[PDFElement("head")]
public HTMLHead Head
{
get;
set;
}
[PDFElement("body")]
public HTMLBody Body
{
get ;
set ;
}
[PDFAttribute("lang")]
public string Language
{
get;
set;
}
.
.
.
}
}
And so it goes on into the rest of the xml, reading elements and attributes, and trying to set the values to components or property values.
1.6.2. Parsing documents from files
------------------------------------
The easiest way to parse any xml content is to use the various static methods on the ``Scryber.Components.Document`` class.
There are 2 variants called ``ParseDocument`` and ``Parse``.
``ParseDocument`` has 6 overloads and the content parsed must have a root object that is (or inherits from) ``Scryber.Components.Document``
The simplest is to load directly from a file
.. code:: csharp
//using Scryber.components
string filepath = GetPathToFile();
var doc = Document.ParseDocument(filepath);
This reads the file from the stream and will resolve any references to relative content (images, stylesheets, etc) based on the *filepath*.
1.6.3. Parsing documents from streams
--------------------------------------
If you want to load content dynamically from a stream then you can use the overloads that take a stream.
An enumeration value for ParseSourceType must be provided, and an optional path value, so the parser can know where other references may reside.
.. code:: csharp
//from a stream with no references
using(var stream = GetMyDocumentContent())
{
doc = Document.ParseDocument(stream, PaseSourceType.DynamicContent);
}
If the stream will contain relative path references to other content such as stylesheets or embedded content then a path should be provided.
If no path is provided then content will be looked for relative to any basePath specified in the source stream.
If no base path is provided then content will be looked for relative to the current executing assembly.
.. code:: csharp
//from a stream where references are known to be stored
var path = "C:/MyFiles/BasePath";
using(var stream = GetMyDocumentContent())
{
doc = Document.ParseDocument(stream, path, PaseSourceType.DynamicContent);
}
The options for the content can be any of the following.
* A ``System.IO.Stream`` or one of its sublcasses.
* A ``System.IO.TextReader`` or one of its subclasses.
* A ``System.XML.XmlReader`` or one of its subclasses.
Ultimately the content should be valid XML that can be read.
For example, using an XmlReader
.. code:: csharp
//using System.Xml.Linq
//Scryber.UnitSamples/OverviewSamples.cs
public void XLinqParsing()
{
XNamespace ns = "http://www.w3.org/1999/xhtml";
var html = new XElement(ns + "html",
new XElement(ns + "head",
new XElement(ns + "title",
new XText("Hello World"))
),
new XElement(ns + "body",
new XElement(ns + "div",
new XAttribute("style", "padding:10px"),
new XText("Hello World."))
)
);
using (var reader = html.CreateReader())
{
//passing an empty string to the path as we don't have images or other references to load
using (var doc = Document.ParseDocument(reader, string.Empty, ParseSourceType.DynamicContent))
{
using (var stream = GetOutputStream("Overview", "XLinqParsing.pdf"))
{
doc.SaveAsPDF(stream);
}
}
}
}
Or from a string itself
.. code:: csharp
//using System.IO
//Scryber.UnitSamples/OverviewSamples.cs
public void StringParsing()
{
var title = "Hello World";
var src = @"
" + title + @"
" + title + @".
";
using (var reader = new StringReader(src))
{
using (var doc = Document.ParseDocument(reader, string.Empty, ParseSourceType.DynamicContent))
{
using (var stream = GetOutputStream("Overview", "StringParsing.pdf"))
{
doc.SaveAsPDF(stream);
}
}
}
}
All 3 methods create exactly the same document.
It also allows for building dynamic documents at runtime - but there are other ways :see::7_parameters_and_expressions
1.6.4. Building in code
------------------------
The template parsing engine is both flexible and extensible, but it does not have to be used.
Scryber components are **real** object classes, they have properties and methods along with inner collections.
We can just as easily create the document using a method.
.. code:: csharp
//using Scryber.Components
//using Scryber.Drawing
//Scryber.UnitSamples/OverviewSamples.cs
protected Document GetHelloWorld()
{
var doc = new Document();
doc.Info.Title = "Hello World";
var page = new Page();
doc.Pages.Add(page);
var div = new Div() { Padding = new PDFThickness(10) };
page.Contents.Add(div);
div.Contents.Add(new TextLiteral("Hello World"));
return doc;
}
public void DocumentInCode()
{
using (var doc = GetHelloWorld())
{
using (var stream = GetOutputStream("Overview", "CodedDocument.pdf"))
{
doc.SaveAsPDF(stream);
}
}
}
This works well, and may have benefits for your implementations, but ultimately could become very complex and difficult to maintain.
1.6.5. Embedding other content
-------------------------------
Including content from other sources (files) is easy within the template by using the ``