HTML File Parser For ASP.NET MVC
When creating dynamic views in MVC you might need an API that helps you manipulate the html using .NET code. I usually use "HTML Agility Pack" for this mission, it is quite flexible and easy to use .NET API. In this post I will not discuss the way of creating dynamic views, I will only highlight main functionalities of the HTML Agility Pack and I will give some example.
The HtmlAgilityPack is an open source library that you can use in your MVC .NET or MVC .NET Core Web Applications. It is an agile HTML parser that manipulate HTML DOM elements and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
You can also use this library to parse HTML page to get the latest news, in case the RSS feed is not available.
Configure HtmlAgilityPack for ASP.NET MVC in 2 Steps:
Step 1: Installing HtmlAgilityPack
HtmlAgilityPack can be downloaded from NuGet.
You can use the GUI or the following command in the Package Manager Console:
PM> Install-Package HtmlAgilityPack -Version 1.7.2
You can also install the same library for .NET Core
PM> Install-Package HtmlAgilityPack.NetCore -Version 1.5.0.1
That's it, you can now compile and run your application and you will be able to use HtmlAgilityPack.
Step 2: Using the HtmlAgilityPack Library
There are 4 options to start manipulating the HTML content:
Load Html From File
HtmlDocument.Load method loads an HTML document from a file.
string filePath = @"c:\inetpub\wwwroot\mywebsite\views\site\home.cshtml";
var doc = new HtmlDocument();
doc.Load(filePath);
var node = doc.DocumentNode.SelectSingleNode("//body");
Load Html From String
HtmlDocument.LoadHtml method loads the HTML document from the specified string.
var htmlString =
@"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html>";
var doc = new HtmlDocument();
doc.LoadHtml(htmlString);
var body = doc.DocumentNode.SelectSingleNode("//body");
Load Html From Web
HtmlWeb.Load method gets an HTML document from an internet resource.
var url = @"http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);
var node = doc.DocumentNode.SelectSingleNode("//head/title");
Load Html From Browser
HtmlWeb.Load method gets an HTML document from a web browser. It makes possible to wait for JavaScript to be run by customizing the isBrowserScriptCompleted parameter.
string url = "http://html-agility-pack/from-browser";
var web1 = new HtmlWeb();
var doc1 = web1.LoadFromBrowser(url, o =>
{
var webBrowser = (WebBrowser) o;
// WAIT until the dynamic text is set
return !string.IsNullOrEmpty(webBrowser.Document.GetElementById("uiDynamicText").InnerText);
});
var t1 = doc1.DocumentNode.SelectSingleNode("//div[@id='uiDynamicText']").InnerText;
var web2 = new HtmlWeb();
var doc2 = web2.LoadFromBrowser(url, html =>
{
// WAIT until the dynamic text is set
return !html.Contains("<div id=\"uiDynamicText\"></div>");
});
var t2 = doc2.DocumentNode.SelectSingleNode("//div[@id='uiDynamicText']").InnerText;