How to convert a web page to a PDF using iText.Net and a WebBrowser object

by Heathesh 12. August 2010 06:09

Disclaimer: This is a hack, it uses screen captures using the WebBrowser object to populate the PDF. I did it in my spare time for fun, and I would not advise using it in a production environment without thorough testing and nerves of steel.

I've noticed a few websites that charge a fee to convert a web page to a PDF. So I was curious if there was an open source alternative to this. I know there are libraries to create PDF's, but all my googling could not find one that would convert a web page or HTML to PDF. Having played around with the iText.Net (http://sourceforge.net/projects/itextdotnet/) libraries before, I thought I would like to see if there was a way to use them to accomplish this for free.

To begin with, you need to download the relevant libraries from http://sourceforge.net/projects/itextdotnet/. For convienience sake I've added the DLL's I used here (zipped - 2,816 KB download):

http://heathesh.com/ftp/itextnet.zip

I'm going to be using Visual Studio 2008 for this and have not tried it using Visual Studio 2010. So using Visual Studio 2008 create a Console Application project. This needs to work in a single-threaded apartment state so using a web page or web service would require you to do more "hacking". You can see my previous post "Run a single-threaded apartment method with parameters that returns a value within a web service" on what is needed to achieve that.

Once your project has been created add all the DLL's from the ZIP file as references to your project. Also add references to the following .Net DLLs:

System.Drawing
System.Windows.Forms

Now add the following usings to your console app Program.cs file:

//NOTE: Do not add System.Drawing as the namespace will cause conflicts with classes in the PDF libraries
using System.IO;
using System.Windows.Forms;
using com.lowagie.text;
using com.lowagie.text.pdf;


Next you need to add the following method to your code. The method will load a web browser, navigate to the specified url and take a screen shot of the web page.

        /// <summary>
        /// Generate the screen shot image for the specified URL
        /// </summary>
        /// <param name="url"></param>
        /// <param name="width"></param>
        /// <param name="height"></param>
        /// <returns></returns>
        public static System.Drawing.Bitmap generateScreenshotImage(string url, int width, int height)
        {
            // Load the webpage into a WebBrowser control
            using (WebBrowser webBrowser = new WebBrowser())
            {
                //disable the scroll bars and supress script errors, then navigate to the url
                webBrowser.ScrollBarsEnabled = false;
                webBrowser.ScriptErrorsSuppressed = true;
                webBrowser.Navigate(url);

                //wait for the page to load
                while (webBrowser.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }

                // Set the size of the WebBrowser control
                webBrowser.Width = width;
                webBrowser.Height = height;

                if (width == -1)
                    // Take Screenshot of the web pages full width
                    webBrowser.Width = webBrowser.Document.Body.ScrollRectangle.Width;

                if (height == -1)
                    // Take Screenshot of the web pages full height
                    webBrowser.Height = webBrowser.Document.Body.ScrollRectangle.Height;

                // Get a Bitmap representation of the webpage as it's rendered in the WebBrowser control
                System.Drawing.Bitmap bitmap = new System.Drawing.Bitmap(webBrowser.Width, webBrowser.Height);
                webBrowser.DrawToBitmap(bitmap, new System.Drawing.Rectangle(0, 0, webBrowser.Width, webBrowser.Height));
                return bitmap;
            }
        }

Okay... so we've now got a method to generate a screen shot of the web page, next we need to be able to retrieve the byte[] array of the specified bitmap the above method returns. So add the following method to your code:

        /// <summary>
        /// Gets the byte array for the specified bitmap
        /// </summary>
        /// <param name="bitmap"></param>
        /// <returns></returns>
        private static byte[] getBytesForBitmap(System.Drawing.Bitmap bitmap)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                bitmap.Save(memoryStream, System.Drawing.Imaging.ImageFormat.Png);
                return memoryStream.GetBuffer();
            }
        }

We're almost done. We now need to add methods to use the bitmap created above and create the actual PDF. For this I created and added the following four methods. The code comments should be self-explanatory:

        /// <summary>
        /// Converts the specified url to a pdf file
        /// </summary>
        /// <param name="url"></param>
        /// <param name="fileName"></param>
        private static void convertUrlToPdf(string url, string fileName)
        {
            byte[] pdfBytes = convertWebPageToPdf(url);

            using (FileStream fileStream = new FileStream(fileName, FileMode.Create, FileAccess.ReadWrite))
            {
                BinaryWriter binaryWriter = new BinaryWriter(fileStream);
                binaryWriter.Write(pdfBytes);
                binaryWriter.Close();
                fileStream.Close();
            }
        }

        /// <summary>
        /// Converts the specified url to a PDF byte array
        /// </summary>
        /// <param name="url"></param>
        /// <returns></returns>
        private static byte[] convertWebPageToPdf(string url)
        {
            // step 1: creation of a document-object
            Document document = new Document();

            // step 2:
            // we create a writer that listens to the document
            // and directs a PDF-stream to a file
            MemoryStream memoryStream = new MemoryStream();
            PdfWriter.getInstance(document, memoryStream);

            // step 3: we open the document
            document.open();

            //get the screen shot of the web page
            using (System.Drawing.Bitmap screenshot = generateScreenshotImage(url, 1020, -1))
            {
                //if there is more than one page, split the image otherwise just add the image
                if (screenshot.Height > 1500)
                    separatePages(document, screenshot);
                else
                    addImage(document, screenshot);
            }

            // step 5: we close the document
            document.close();
           
            //return the byte[] of the pdf
            return memoryStream.GetBuffer();
        }

        /// <summary>
        /// Add the image to the PDF document
        /// </summary>
        /// <param name="document"></param>
        /// <param name="screenshot"></param>
        private static void addImage(Document document, System.Drawing.Bitmap screenshot)
        {
            Image png = Image.getInstance(getBytesForBitmap(screenshot));
            png.scalePercent(50);
            document.add(png);
        }

        /// <summary>
        /// Separates the pages of the bitmap into the PDF document
        /// </summary>
        /// <param name="document"></param>
        /// <param name="screenshot"></param>
        private static void separatePages(Document document, System.Drawing.Bitmap screenshot)
        {
            int reminder = screenshot.Height % 1500;
            int pages = screenshot.Height / 1500 + (reminder > 0 ? 1 : 0);
            int y = 0;
            int height = 1500;

            for (int i = 0; i < pages; i++)
            {
                //if this is the last page, and we have a reminder, we need to adjust the height accordingly
                if (i == pages - 1 && reminder > 0)
                    height = screenshot.Height - y;

                using (System.Drawing.Bitmap pageBitmap = screenshot.Clone(new System.Drawing.Rectangle(0, y, 1020, height), System.Drawing.Imaging.PixelFormat.DontCare))
                {
                    //add the image
                    addImage(document, pageBitmap);

                    //increment the height counter to move to the next page
                    y += 1500;
                }
            }
        }

Okay so we've now got everything we need. In our Main method we simply need to call the convertUrlToPdf method with a URL and PDF file name and it will generate a PDF of the website for us. There is just one thing, the WebBrowser control can only run in a single-threaded apartment state. So we need to decorate our Main method with the STAThread attribute:

        [STAThread] //run in single-threaded apartment state
        static void Main(string[] args)
        {
            convertUrlToPdf("http://iservice.co.za", "iservice.pdf");
        }


That's basically it. Run the code and you'll find the iservice.pdf in your output (bin\Debug - depending on your configuration mode) folder.

Happy PDFing!

Tags: , , , , ,

Development | .Net | Visual Studio 2008 | VS2008 | PDF

Creating a Visual Studio 2010 ASP.Net Reports Web Site with a LINQ DBML data source

by Heathesh 8. June 2010 02:48

With Visual Studio 2008, it appears the only way to connect to a LINQ DBML and use it as a data source was to create your own Data Processing Extensions implementation. With Visual Studio 2010 it's a little bit simpler, but can still cause some pain.

To begin with, create a ASP.Net Reporting Web Site. I first tried this by creating a web application and after much hair pulling decided to use the preconfigured Reporting website project type to save myself the configuration issues.

Once you've creating the website, the Data Source configuration wizard should start automatically. I simply cancelled this. As part of the website you'll see that it adds a "Report1.rdlc" file, I deleted this and added a new report, and called it "PeopleDetails".

If you do delete the "Report1.rdlc", be sure to open the HTML of the Default.aspx and change the report name accordingly:

        <rsweb:ReportViewer ID="ReportViewer1" runat="server" Font-Names="Verdana" Font-Size="8pt">
            <LocalReport ReportPath="PeopleDetails.rdlc">
            </LocalReport>
        </rsweb:ReportViewer>


Next add a new "Class Library" project to your Solution. This class library project will be used to store your LINQ DBML in, and it needs to be separate in order for you to make it accessible on the machine so the report can read it.

You can delete the sample "Class1.cs" file created by the IDE, and add a new "LINQ To SQL Classes" item. Next add the relevant tables etc. to the DBML as required. Then add the newly created project as a reference to your website. Make sure to add the connection string as required to the Web.Config of your website.

    <connectionStrings>
        <add name="PeopleDbml.Properties.Settings.SampleConnectionString"
            connectionString="Data Source=.\server;Initial Catalog=Sample;Integrated Security=True"
            providerName="System.Data.SqlClient" />
    </connectionStrings>


Next add a class to your "App_Code" folder, I called mine DataReader simply because it seemed to be the most relevant name. In your DataReader class, add a method to retrieve a List of the data you wish to diplay on your report. In my case I wanted a list of People from the Person table in the database. So I implemented a method called GetPeople which accepts the data context as a parameter:

    /// <summary>
    /// Gets a list of people from the database
    /// </summary>
    /// <returns></returns>
    public List<Person> GetPeople(PeopleDataClassesDataContext dx)
    {
        return (from people in dx.Persons
                select people).ToList();
    }


Now comes the weird part. For some reason the report will not be able to find the PeopleDbml dll. If you tried to add a table for example, and tried to add a data source to that table you will get an error that it can't find the dll. To solve this problem, compile your PeopleDbml class library, find the dll and copy it to the "C:\Program Files\Microsoft Visual Studio 10.0\Common7\IDE\PrivateAssemblies" folder.

Once you've done that, close Visual Studio 2010 (because I haven't found another way of doing this), reopen it and reopen your solution. Then try and add a table to your report and it should work now without any errors. Simply set the name of the DataSet as required, I called mine PeopleDataSet, set the DataSource to "global" and you should be able to select "DataReader (GetPeople)" in the "Available DataSets" drop down.

Once the DataSet has been added, design your table (i.e. setup your fields as required, I simply clicked and dragged three different fields into the three columns available).

The last thing you need to do is set the ReportDataSource for the report. I did this by adding the following code to my Default.aspx page:

using Microsoft.Reporting.WebForms; //using for the ReportDataSource
using PeopleDbml;

public partial class _Default : System.Web.UI.Page
{
    /// <summary>
    /// Gets the connection string from the web.config
    /// </summary>
    private string _connectionString
    {
        get { return ConfigurationManager.ConnectionStrings["PeopleDbml.Properties.Settings.SampleConnectionString"].ConnectionString; }
    }

    /// <summary>
    /// Gets or sets the data context
    /// </summary>
    private PeopleDataClassesDataContext _dataContext
    {
        get;
        set;
    }

    /// <summary>
    /// Handles the page load event
    /// </summary>
    /// <param name="sender"></param>
    /// <param name="e"></param>
    protected void Page_Load(object sender, EventArgs e)
    {
        if (!IsPostBack)
        {
            if (_dataContext == null)
                _dataContext = new PeopleDataClassesDataContext(_connectionString);

            //create an instance of my data reader class
            DataReader dataReader = new DataReader();
            //create the report data source, specifying that it's for the PeopleDataSet
            ReportDataSource reportDataSource = new ReportDataSource("PeopleDataSet", dataReader.GetPeople(_dataContext));
            //add the data source to the report viewer's local report data sources
            ReportViewer1.LocalReport.DataSources.Add(reportDataSource);
        }
    }

    /// <summary>
    /// Handle the dispose to clean up the data context, which overriden to clean up the data context
    /// </summary>
    public override void Dispose()
    {
        base.Dispose();

        if (_dataContext != null)
            _dataContext.Dispose();
    }
}


That was it. Happy Reporting!

 

Tags: , , , , ,

Development | .Net | Visual Studio 2010 | VS2010 | LINQ

Powered by BlogEngine.NET 1.5.0.7 (with enhancements by Heathesh)
Theme by Mads Kristensen (with tweeks by Heathesh)

Certifications

Microsoft Certified Professional

Microsoft Certified Technology Specialist

Calendar

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

View posts in large calendar