<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
     "http://docbook.org/xml/4.4/docbookx.dtd">
<article>
  
  <articleinfo>
    
    <title>C++ dlopen mini HOWTO</title>
    
    <author>
      <firstname>Aaron</firstname>
      <surname>Isotton</surname>
      <affiliation>
        <address><email>aaron@isotton.com</email></address>
      </affiliation>
    </author>
    
    <pubdate>2006-03-16</pubdate>

    <revhistory>
	
      <revision>
	<revnumber>1.10</revnumber>
	<date>2006-03-16</date>
	<authorinitials>AI</authorinitials>
	
	<revremark>Changed the license from the GFDL to the GPL. Fixed usage
	    of dlerror; thanks to Carmelo Piccione. Using a virtual destructor
	    in the example; thanks to Joerg Knobloch. Added Source Code
	    section. Minor fixes.</revremark>
	
      </revision>
 
      <revision>
	<revnumber>1.03</revnumber>
	<date>2003-08-12</date>
	<authorinitials>AI</authorinitials>
	<revremark>Added reference to the GLib Dynamic Module
	  Loader.  Thanks to G. V. Sriraam for the pointer.</revremark>
      </revision>
      
      <revision>
        <revnumber>1.02</revnumber>
        <date>2002-12-08</date>
        <authorinitials>AI</authorinitials>
        <revremark>Added FAQ.  Minor changes</revremark>
      </revision>

      <revision>
        <revnumber>1.01</revnumber>
	<date>2002-06-30</date>
        <authorinitials>AI</authorinitials>
        <revremark>Updated virtual destructor explanation.  Minor changes.</revremark>
      </revision>

      <revision>
	<revnumber>1.00</revnumber>
	<date>2002-06-19</date>
	<authorinitials>AI</authorinitials>
	<revremark>Moved copyright and license section to the
	beginning.  Added terms section.  Minor changes.</revremark>
      </revision>
      
      <revision>
	<revnumber>0.97</revnumber>
	<date>2002-06-19</date>
	<authorinitials>JYG</authorinitials>
	<revremark>Entered minor grammar and sentence level changes.</revremark>
      </revision>

      <revision>
        <revnumber>0.96</revnumber>
        <date>2002-06-12</date>
        <authorinitials>AI</authorinitials>
        <revremark>Added bibliography.  Corrected explanation of extern
          functions and variables.</revremark>
      </revision>

      <revision>
        <revnumber>0.95</revnumber>
        <date>2002-06-11</date>
        <authorinitials>AI</authorinitials>
        <revremark>Minor improvements.</revremark>
      </revision>

    </revhistory>
    
    <abstract>
      <para>How to dynamically load C++ functions and classes using
      the <function>dlopen</function> API.</para>
    </abstract>
    
  </articleinfo>
  
  
  <section id="intro">
    <title>Introduction</title>
    
    <para>
      A question which frequently arises among Unix C++ programmers is
      how to load C++ functions and classes dynamically using the
      <function>dlopen</function> API.
    </para>
    
    <para>In fact, that is not always simple and needs some
      explanation.  That's what this mini HOWTO does.</para>
    
    <para>An average understanding of the <systemitem>C</systemitem>
      and <systemitem>C++</systemitem> programming language and of the
      <function>dlopen</function> API is necessary to understand this
      document.</para>

    <para>This HOWTO's master location is <ulink
        url="http://www.isotton.com/howtos/C++-dlopen-mini-HOWTO/"/>.</para>
    
    <section id="copyright">
      <title>Copyright and License</title>

      <para>This document, <emphasis>C++ dlopen mini HOWTO</emphasis>, is
	  copyrighted (c) 2002-2006 by <emphasis>Aaron Isotton</emphasis>.
	  Permission is granted to copy, distribute and/or modify this
	  document under the terms of the GNU General Public License, Version
	  2, as published by the Free Software Foundation.</para>
  
  </section>
  
    <section id="disclaimer">
      <title>Disclaimer</title>
      
      <para>
        No liability for the contents of this document can be
        accepted.  Use the concepts, examples and information at your
        own risk.  There may be errors and inaccuracies, that could be
        damaging to your system.  Proceed with caution, and although
        this is highly unlikely, the author(s) do not take any
        responsibility.
      </para>
      
      <para>
        All copyrights are held by their by their respective owners,
        unless specifically noted otherwise.  Use of a term in this
        document should not be regarded as affecting the validity of
        any trademark or service mark.  Naming of particular products
        or brands should not be seen as endorsements.
      </para>
    </section>
    
    <section id="credits">
      <title>Credits / Contributors</title>
      
      <para>
        In this document, I have the pleasure of acknowledging (in
        alphabetic order):
      </para>
      
      <itemizedlist>

        <listitem>
          <para>Joy Y Goodreau <email>joyg (at) us.ibm.com</email> for
            her editing.</para>
        </listitem>

        <listitem>
          <para>D. Stimitis <email>stimitis (at) idcomm.com</email>
            for pointing out a few issues with the formatting and the
            name mangling, as well as pointing out a few subtleties of
	    <literal>extern "C"</literal>.</para>

	<para>Many unnamed others pointing out errors or giving tips to
	    improve this howto. You know who you are!</para>

        </listitem>

      </itemizedlist>
      
    </section>
    
    <section id="feedback">
      <title>Feedback</title>
      
      <para>
        Feedback is most certainly welcome for this document. Send
        your additions, comments and criticisms to the following email
        address: <email>aaron@isotton.com</email>.
      </para>
    </section>

    <section>
      <title>Terms Used in this Document</title>

      <variablelist>
        <varlistentry>
          <term><function>dlopen</function> API</term>
          <listitem>
            <para>The <function>dlclose</function>,
              <function>dlerror</function>,
              <function>dlopen</function> and
              <function>dlsym</function> functions as described in the
              <literal>dlopen(3)</literal> man page.</para>
            
            <para>Notice that we use
              <quote><function>dlopen</function></quote> to refer to
              the individual <function>dlopen</function>
              <emphasis>function</emphasis>, and
              <quote><function>dlopen</function> API</quote> to refer
              to the <emphasis>entire API</emphasis>.</para>
          </listitem>
        </varlistentry>
      </variablelist>

    </section>
    
    <!-- Translations - ->
    <section id="translations">
    <title>Translations</title>

    <para>
    Pointers to available translations are nice.
    Also your translators tend to give very important inputs.
  </para>

    <itemizedlist>
    <listitem>
    <para>
    <ulink url="http://tldp.org/">French Translation</ulink>
    provided by Individual <email>someone (at) somewhere.fr</email>
  </para>
  </listitem>
    <listitem>
    <para>
    <ulink url="http://tlpd.org/">German Translation</ulink>
    provided by Individual <email>someone (at) somewhere.de</email>
  </para>
  </listitem>
  </itemizedlist>
  </section>
    -->
  </section>

  <section id="theproblem">
    <title>The Problem</title>

    <para>At some time you might have to load a library (and use its
      functions) at runtime; this happens most often when you are
      writing some kind of plug-in or module architecture for your
      program.</para>

    <para>In the C language, loading a library is very simple (calling
      <function>dlopen</function>, <function>dlsym</function> and
      <function>dlclose</function> is enough), with C++ this is a bit
      more complicated.  The difficulties of loading a C++ library
      dynamically are partially due to <link linkend="mangling">name
      mangling</link>, and partially due to the fact that the
      <function>dlopen</function> API was written with C in mind, thus
      not offering a suitable way to load classes.</para>

    <para>Before explaining how to load libraries in C++, let's better
      analyze the problem by looking at name mangling in more
      detail. I recommend you read the explanation of name mangling,
      even if you're not interested in it because it will help you
      understanding why problems occur and how to solve them.</para>

    <section id="mangling">
      <title>Name Mangling</title>

      <para>In every C++ program (or library, or object file), all
        non-static functions are represented in the binary file as
        <emphasis>symbols</emphasis>. These symbols are special text
        strings that uniquely identify a function in the program,
        library, or object file.</para>

      <para>In C, the symbol name is the same as the function name:
        the symbol of <function>strcpy</function> will be
        <computeroutput>strcpy</computeroutput>, and so on. This is
        possible because in C no two non-static functions can have the
        same name.</para>

      <para>Because C++ allows overloading (different functions with
        the same name but different arguments) and has many features C
        does not &mdash; like classes, member functions, exception
        specifications &mdash; it is not possible to simply use the
        function name as the symbol name.  To solve that, C++ uses
        so-called <emphasis>name mangling</emphasis>, which transforms
        the function name and all the necessary information (like the
        number and size of the arguments) into some weird-looking
        string which only the compiler knows about.  The mangled name
        of <function>foo</function> might look like
        <computeroutput>foo@4%6^</computeroutput>, for example.  Or it
        might not even contain the word <quote>foo</quote>.</para>

      <para> One of the problems with name mangling is that the C++
        standard (currently <citation>ISO14882</citation>) does not
        define how names have to be mangled; thus every compiler
        mangles names in its own way. Some compilers even change their
        name mangling algorithm between different versions (notably
        g++ 2.x and 3.x).  Even if you worked out how your particular
        compiler mangles names (and would thus be able to load
        functions via <function>dlsym</function>), this would most
        probably work with your compiler only, and might already be
        broken with the next version.</para>

    </section>

    <section>
      <title>Classes</title>

      <para>Another problem with the <function>dlopen</function> API
        is the fact that it only supports loading
        <emphasis>functions</emphasis>. But in C++ a library often
        exposes a class which you would like to use in your
        program. Obviously, to use that class you need to create an
        instance of it, but that cannot be easily done.</para>

    </section>

  </section>

  <section id="thesolution">
    <title>The Solution</title>

    <section id="externC">
      <title><literal>extern "C"</literal></title>
      
      <para>C++ has a special keyword to declare a function with C
        bindings: <literal>extern "C"</literal>. A function declared
        as <literal>extern "C"</literal> uses the function name as
        symbol name, just as a C function. For that reason, only
        non-member functions can be declared as <literal>extern
        "C"</literal>, and they cannot be overloaded.</para>
      
      <para>Although there are severe limitations, <literal>extern
        "C"</literal> functions are very useful because they can be
        dynamically loaded using <function>dlopen</function> just like
        a C function.</para>

      <para>This does <emphasis>not</emphasis> mean that functions
        qualified as <literal>extern "C"</literal> cannot contain C++
        code. Such a function is a full-featured C++ function which
        can use C++ features and take any type of argument.</para>

    </section>

    <section id="loadingfunctions">
      <title>Loading Functions</title>

      <para>In C++ functions are loaded just like in C, with
        <function>dlsym</function>. The functions you want to load
        must be qualified as <literal>extern "C"</literal> to avoid
        the symbol name being mangled.</para>

      <example>
        <title>Loading a Function</title>

        <para>main.cpp:</para>
	<programlisting><![CDATA[#include <iostream>
#include <dlfcn.h>

int main() {
    using std::cout;
    using std::cerr;

    cout << "C++ dlopen demo\n\n";

    // open the library
    cout << "Opening hello.so...\n";
    void* handle = dlopen("./hello.so", RTLD_LAZY);
    
    if (!handle) {
        cerr << "Cannot open library: " << dlerror() << '\n';
        return 1;
    }
    
    // load the symbol
    cout << "Loading symbol hello...\n";
    typedef void (*hello_t)();

    // reset errors
    dlerror();
    hello_t hello = (hello_t) dlsym(handle, "hello");
    const char *dlsym_error = dlerror();
    if (dlsym_error) {
        cerr << "Cannot load symbol 'hello': " << dlsym_error <<
            '\n';
        dlclose(handle);
        return 1;
    }
    
    // use it to do the calculation
    cout << "Calling hello...\n";
    hello();
    
    // close the library
    cout << "Closing library...\n";
    dlclose(handle);
}]]></programlisting>

        <para>hello.cpp:</para>
        <programlisting><![CDATA[#include <iostream>

extern "C" void hello() {
    std::cout << "hello" << '\n';
}
]]></programlisting>
      </example>

      <para>The function <function>hello</function> is defined in
        <filename>hello.cpp</filename>as <literal>extern
          "C"</literal>; it is loaded in <filename>main.cpp</filename>
        with the <function>dlsym</function> call. The function must be
        qualified as <literal>extern "C"</literal> because otherwise
        we wouldn't know its symbol name.</para>

      <warning>
        <para>There are two different forms of the
          <literal>extern "C"</literal> declaration: <literal>extern
            "C"</literal> as used above, and <literal>extern "C" {
            &hellip; }</literal> with the declarations between the
          braces. The first (inline) form is a declaration with extern
          linkage and with C language linkage; the second only affects
          language linkage. The following two declarations are thus
          equivalent:
          
          <informalexample>
            <programlisting>extern "C" int foo;
extern "C" void bar();
            </programlisting>
          </informalexample>
            and
          <informalexample>
            <programlisting>extern "C" {
     extern int foo;
     extern void bar();
}</programlisting>
          </informalexample>

          As there is no difference between an
          <literal>extern</literal> and a
          non-<literal>extern</literal> <emphasis>function</emphasis>
          declaration, this is no problem as long as you are not
          declaring any variables. If you declare
          <emphasis>variables</emphasis>, keep in mind that

          <informalexample>
            <programlisting>extern "C" int foo;</programlisting>
          </informalexample>
            and
          <informalexample>
            <programlisting>extern "C" {
    int foo;
}</programlisting>
          </informalexample>
          
          are <emphasis>not</emphasis> the same thing.</para>
        
        <para>For further clarifications, refer to
          <citation>ISO14882</citation>, 7.5, with special attention
          to paragraph 7, or to <citation>STR2000</citation>,
          paragraph 9.2.4.</para>

        <para>Before doing fancy things with extern variables, peruse
          the documents listed in the <link linkend="seealso">see
          also</link> section.</para>

      </warning>

    </section>
    
    <section id="loadingclasses">
      <title>Loading Classes</title>

      <para>Loading classes is a bit more difficult because we need
        an <emphasis>instance</emphasis> of a class, not just a
        pointer to a function.</para>

      <para>We cannot create the instance of the class using
        <literal>new</literal> because the class is not defined in the
        executable, and because (under some circumstances) we don't
        even know its name.</para>

      <para>The solution is achieved through polymorphism. We define a
        base, <emphasis>interface</emphasis> class with virtual
        members <emphasis>in the executable</emphasis>, and a derived,
        <emphasis>implementation</emphasis> class <emphasis>in the
          module</emphasis>. Generally the interface class is
        abstract (a class is abstract if it has pure virtual
        functions).</para>

      <para>As dynamic loading of classes is generally used for
        plug-ins &mdash; which must expose a clearly defined interface
        &mdash; we would have had to define an interface and derived
        implementation classes anyway.</para>

      <para>Next, while still in the module,  we define two additional helper
        functions, known as <emphasis>class factory
	    functions</emphasis>. One of these functions creates an instance of 
	the class and returns a pointer to it. The other function takes a
        pointer to a class created by the factory and destroys
        it. These two functions are qualified as <literal>extern
          "C"</literal>.</para>

      <para>To use the class from the module, load the two factory
        functions using <function>dlsym</function> just <link
        linkend="loadingfunctions">as we loaded the the hello
        function</link>; then, we can create and destroy as many
        instances as we wish.</para>

      <example>
        <title>Loading a Class</title>

        <para>Here we use a generic <classname>polygon</classname>
          class as interface and the derived class
          <classname>triangle</classname> as implementation.</para>

        <para>main.cpp:</para>
	<programlisting><![CDATA[#include "polygon.hpp"
#include <iostream>
#include <dlfcn.h>

int main() {
    using std::cout;
    using std::cerr;

    // load the triangle library
    void* triangle = dlopen("./triangle.so", RTLD_LAZY);
    if (!triangle) {
        cerr << "Cannot load library: " << dlerror() << '\n';
        return 1;
    }

    // reset errors
    dlerror();
    
    // load the symbols
    create_t* create_triangle = (create_t*) dlsym(triangle, "create");
    const char* dlsym_error = dlerror();
    if (dlsym_error) {
        cerr << "Cannot load symbol create: " << dlsym_error << '\n';
        return 1;
    }
    
    destroy_t* destroy_triangle = (destroy_t*) dlsym(triangle, "destroy");
    dlsym_error = dlerror();
    if (dlsym_error) {
        cerr << "Cannot load symbol destroy: " << dlsym_error << '\n';
        return 1;
    }

    // create an instance of the class
    polygon* poly = create_triangle();

    // use the class
    poly->set_side_length(7);
        cout << "The area is: " << poly->area() << '\n';

    // destroy the class
    destroy_triangle(poly);

    // unload the triangle library
    dlclose(triangle);
}]]></programlisting>

        <para>polygon.hpp:</para>
	<programlisting><![CDATA[#ifndef POLYGON_HPP
#define POLYGON_HPP

class polygon {
protected:
    double side_length_;

public:
    polygon()
        : side_length_(0) {}

    virtual ~polygon() {}

    void set_side_length(double side_length) {
        side_length_ = side_length;
    }

    virtual double area() const = 0;
};

// the types of the class factories
typedef polygon* create_t();
typedef void destroy_t(polygon*);

#endif]]></programlisting>

        <para>triangle.cpp:</para>
        <programlisting><![CDATA[#include "polygon.hpp"
#include <cmath>

class triangle : public polygon {
public:
    virtual double area() const {
        return side_length_ * side_length_ * sqrt(3) / 2;
    }
};


// the class factories

extern "C" polygon* create() {
    return new triangle;
}

extern "C" void destroy(polygon* p) {
    delete p;
}
]]></programlisting>

      </example>

      <para>There are a few things to note when loading classes:</para>

      <itemizedlist>
        <listitem>
          <para>You must provide <emphasis>both</emphasis> a creation
            and a destruction function; you must
            <emphasis>not</emphasis> destroy the instances using
            <literal>delete</literal> from inside the executable, but
            always pass it back to the module. This is due to the fact
            that in C++ the operators <literal>new</literal> and
            <literal>delete</literal> may be overloaded; this would
            cause a non-matching <literal>new</literal> and
            <literal>delete</literal> to be called, which could cause
            anything from nothing to memory leaks and segmentation
            faults. The same is true if different standard libraries
            are used to link the module and the executable.</para>
        </listitem>

        <listitem>
          <para>The destructor of the interface class should be
            virtual in any case.  There <emphasis>might</emphasis> be
            very rare cases where that would not be necessary, but it
            is not worth the risk, because the additional overhead can
            generally be ignored.</para>
          <para>If your base class needs no destructor, define an
            empty (and <literal>virtual</literal>) one anyway;
            otherwise you <emphasis>will have problems</emphasis>
            sooner or later; I can guarantee you that.  You can read
            more about this problem in the comp.lang.c++ FAQ at <ulink
              url="http://www.parashift.com/c++-faq-lite/"/>, in
            section 20.</para>
        </listitem>
      </itemizedlist>

    </section>
  </section>

  <section id="source">
      <title>Source Code</title>

      <para>You can download all the source code presented in this howto as an
	  archive: <ulink url="examples.tar.gz"/>.</para>

  </section>
  
  <section id="faq">
    <title>Frequently Asked Questions</title>
    <qandaset>
      <qandaentry>
        <question>
          <para>I'm using Windows and I can't find the
          <filename>dlfcn.h</filename> header file!  What's the problem?</para>
        </question>
        <answer>
	    <para>The problem is that Windows doesn't have the
		<function>dlopen</function> API, and thus there is no
          <filename>dlfcn.h</filename> header.  There is a similar API
          around the <function>LoadLibrary</function> function, and
          most of what is written here applies to it, too. Please refer to the
	  <ulink url="http://msdn.microsoft.com/">Microsoft Developer Network
	      Website</ulink> for more information.</para>
      
       </answer>
        </qandaentry>

      <qandaentry>
	<question>
	  
	  <para>Is there some kind of <function>dlopen</function>-compatible
	    wrapper for the Windows <function>LoadLibrary</function>
	    API?</para>
	  
        </question>
	<answer>
	  
	  <para>I don't know of any, and I don't think there'll ever be one
	    supporting all of <function>dlopen</function>'s options.</para>
	
	  <para>There are alternatives though: libtltdl (a part of libtool),
	    which wraps a variety of different dynamic loading APIs, among
	    others <function>dlopen</function> and
	    <function>LoadLibrary</function>.  Another one is the <ulink
	      url="http://developer.gnome.org/doc/API/glib/glib-dynamic-loading-of-modules.html">Dynamic
	      Module Loading functionality of GLib</ulink>.  You can use one
	    of these to ensure better possible cross-platform compatibility.
	    I've never used any of them, so I can't tell you how stable they
	    are and whether they really work.</para>
	
          <para>You should also read section 4, <quote>Dynamically
          Loaded (DL) Libraries</quote>, of the <ulink
            url="http://www.dwheeler.com/program-library">Program Library
            HOWTO</ulink> for more techniques to load libraries and
          create classes independently of your platform.</para>
 	
        </answer>
      </qandaentry>
    </qandaset>
  </section>

  <section id="seealso">
    <title>See Also</title>

    <itemizedlist>
      <listitem>
        <para>The <function>dlopen(3)</function> man page. It explains
          the purpose and the use of the <function>dlopen</function>
          API.</para>
      </listitem>

      <listitem>
        <para>The article <ulink
            url="http://www.linuxjournal.com/article.php?sid=3687">
            <citetitle>Dynamic Class Loading for C++ on
              Linux</citetitle></ulink> by James Norton published on the
          <ulink url="http://www.linuxjournal.com/">Linux
            Journal</ulink>.</para>
      </listitem>

      <listitem>
        <para>Your favorite C++ reference about <literal>extern
            "C"</literal>, inheritance, virtual functions,
          <literal>new</literal> and <literal>delete</literal>. I
          recommend <citation>STR2000</citation>.</para>
      </listitem>

      <listitem>
        <para><citation>ISO14882</citation></para>
      </listitem>

      <listitem>
        <para>The <ulink
            url="http://www.dwheeler.com/program-library">Program Library
            HOWTO</ulink>, which tells you most things you'll ever need
          about static, shared and dynamically loaded libraries and how
          to create them.  Highly recommended.</para>
      </listitem>

      <listitem>
        <para>The <ulink
            url="http://tldp.org/HOWTO/GCC-HOWTO/index.html">Linux GCC
            HOWTO</ulink> to learn more about how to create libraries
            with GCC.</para>
      </listitem>

    </itemizedlist>

  </section>

  <bibliography>

    <bibliomixed>
      <abbrev>ISO14482</abbrev> <title>ISO/IEC 14482-1998 &mdash; The
        C++ Programming Language</title>. <releaseinfo>Available as
        PDF and as printed book from <ulink
          url="http://webstore.ansi.org/"/>.</releaseinfo>
    </bibliomixed>

    <bibliomixed>
      <abbrev>STR2000</abbrev>
      <author><surname>Stroustrup</surname> <firstname>Bjarne</firstname></author>
      <title>The C++ Programming Language</title>, <edition>Special
        Edition</edition>.
      ISBN <isbn>0-201-70073-5</isbn>.
      <publishername>Addison-Wesley</publishername>.
    </bibliomixed>

  </bibliography>

</article>
