HTML Manipulation using .NET - Manage Dynamic Views in ASP.NET MVC using Html Agility Pack

HTML Manipulation using .NET - Manage Dynamic Views in ASP.NET MVC using Html Agility Pack, html agility pack, htmlagilitypack c#, c# htmlagilitypack, html agility pack c#, c# html agility pack, htmlagilitypack c# example, html agility pack tutorial, html file parser mvc, html file parser

HTML File Parser For ASP.NET MVC

When creating dynamic views in MVC you might need an API that helps you manipulate the html using .NET code. I usually use "HTML Agility Pack" for this mission, it is quite flexible and easy to use .NET API. In this post I will not discuss the way of creating dynamic views, I will only highlight main functionalities of the HTML Agility Pack and I will give some example.

The HtmlAgilityPack is an open source library that you can use in your MVC .NET or MVC .NET Core Web Applications. It is an agile HTML parser that manipulate HTML DOM elements and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

You can also use this library to parse HTML page to get the latest news, in case the RSS feed is not available.

Configure HtmlAgilityPack for ASP.NET MVC in 2 Steps:

Step 1: Installing HtmlAgilityPack

HtmlAgilityPack can be downloaded from NuGet

You can use the GUI or the following command in the Package Manager Console:

PM> Install-Package HtmlAgilityPack -Version 1.7.2

You can also install the same library for .NET Core 

PM> Install-Package HtmlAgilityPack.NetCore -Version 1.5.0.1

That's it, you can now compile and run your application and you will be able to use HtmlAgilityPack.

Step 2: Using the HtmlAgilityPack Library

There are 4 options to start manipulating the HTML content:

Load Html From File

HtmlDocument.Load method loads an HTML document from a file.

string filePath = @"c:\inetpub\wwwroot\mywebsite\views\site\home.cshtml";
var doc = new HtmlDocument();
doc.Load(filePath);
var node = doc.DocumentNode.SelectSingleNode("//body");

Load Html From String

HtmlDocument.LoadHtml method loads the HTML document from the specified string.

var htmlString =
@"<!DOCTYPE html>
<html>
<body>
<h1>This is <b>bold</b> heading</h1>
<p>This is <u>underlined</u> paragraph</p>
<h2>This is <i>italic</i> heading</h2>
</body>
</html>";
var doc = new HtmlDocument();
doc.LoadHtml(htmlString);
var body = doc.DocumentNode.SelectSingleNode("//body");

Load Html From Web

HtmlWeb.Load method gets an HTML document from an internet resource.

var url = @"http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);
var node = doc.DocumentNode.SelectSingleNode("//head/title");

Load Html From Browser

HtmlWeb.Load method gets an HTML document from a web browser. It makes possible to wait for JavaScript to be run by customizing the isBrowserScriptCompleted parameter.

string url = "http://html-agility-pack/from-browser";
var web1 = new HtmlWeb();
var doc1 = web1.LoadFromBrowser(url, o =>
{
var webBrowser = (WebBrowser) o; // WAIT until the dynamic text is set
return !string.IsNullOrEmpty(webBrowser.Document.GetElementById("uiDynamicText").InnerText);
});
var t1 = doc1.DocumentNode.SelectSingleNode("//div[@id='uiDynamicText']").InnerText; var web2 = new HtmlWeb();
var doc2 = web2.LoadFromBrowser(url, html =>
{
// WAIT until the dynamic text is set
return !html.Contains("<div id=\"uiDynamicText\"></div>");
});
var t2 = doc2.DocumentNode.SelectSingleNode("//div[@id='uiDynamicText']").InnerText;

Post a Comment

Previous Post Next Post