Sunday, April 19, 2015

PHP iterating over Blogger posts from Atom Feed XML

Thanks to some decent PHP APIs, it is really easy to read XML from a URL and process the results.

Use case: I have written an expression of interest form that people can use to raise an application to adopt a rescued Chihuahua. Part of the form lists the dogs available - and this list comes from a Blogger feed. The blog is used as a list of all the Chihuahuas that have been rescued (one dog per post) and the ones that are currently available all have a specific label: Available now. The expression of interest form reads the Atom feed for this label and displays a list of the available now dogs on the form.

Retrieve XML from URL and access child elements

The function below reads an XML document from a URL (an Atom feed) and will return an array of the child elements that represent specific posts with a given label from the blog.

function retrieveAvailableNowPosts() {
   // Set URL to XML we want to read - Available now.
   $file="http://chihuahuarescue.blogspot.com.au/feeds/posts/default/-/Available%20now";
   // Load specified XML file or report failure
   $xml = simplexml_load_file($file);
   if (!$xml) {
      return false;
   }
   // Load blog entries
   $posts =  $xml -> entry;
   if (sizeOf($posts) > 0) {
      return $posts;
   } else {
      return null;
   }
}

Notes about this function.

  • $xml = simplexml_load_file($file);
      • Loading the contents of a URL and then parsing the XML it contains is done with simplexml_load_file. The return from this function is either an SimpleXMLElement object or a boolean false if there was an error reading XML from the file (URL in this case).
      • The parameter to simplexml_load_file can be a file or URL.
      • Blogger supports feeds either from RSS 2.0 or Atom 1.0, and you can switch between them simply with a different URL.
      • The URL I am using (http://chihuahuarescue.blogspot.com.au/feeds/posts/default/-/Available%20now) is for an Atom feed. The label is the part after the last forward slash after it has been URL encoded i.e. Available now is the label, which becomes Available%20now after URL encoding. This was easy in my case because I only had to swap the space with %20. If you have more complicated labels (or perhaps need to URL encode dynamic labels), you can use the PHP function urlencode to do this.
  • $posts = $xml -> entry;
    • $xml is the variable containing the XML read from simplexml_load_file.
    • The Atom feed XML returned from the URL has feed as the root element and it contains a variable number of entry elements, each of which is a blog post that was made against the target label. The skeleton is shown below.
      <feed ...>
         ...
         <entry>
            ...
         </entry>
         <entry>
            ...
         </entry>
         <entry>
            ...
         </entry>
      </feed>
    • We access the array of the entry elements on the feed using the "arrow" operator (T_OBJECT_OPERATOR for objects).
    • $posts should therefore be an array which might be empty.
  • if (sizeOf($posts) > 0)
    • If the array has 1 or more elements, return it.
    • Otherwise, return null.

Iterate through XML elements

The function below accepts the XML elements read from the earlier function and iterates through them to output HTML.

function createDogList($posts) {
   $list = '<ul class="availableNowList">';
   // Check if posts is undefined, null, false or empty.
   if (!$posts || sizeOf($posts) == 0) {
      $list .= '<li>Unfortunately there are no dogs available at this time.</li>';
   } else {
      // Go over each entry.
      foreach($posts as $post) {
         // Publish time
         $dateTime = date("l jS F, Y", strtotime(strtok($post->published, 'T')));
         // Link.
         $link = $post->link[4][href];
         // Title.
         $title = $post->title;
         // List the entry.
         $list .=
            '<li>
               <a href="' . $link . '" target="_blank">'
                     . $title . '</a> <em><small>(published ' . $dateTime . ')</small></em>.
            </li>';
      }
   }
   $list .= '</ul>';
   return $list;
}

Notes about this function.

  • This function builds up an un-ordered list (ul) of blog posts, with each list item having
    • A link to the blog post - the link text being the blog entry title.
    • The date on which the post was published. Example list:
      • RUDI (published Sunday 29th March, 2015).
      • MIMI (published Sunday 29th March, 2015).
      • BAXTER (published Sunday 29th March, 2015).
  • If there are no posts against this label, output only one list item with an explanation that there are no matches at this time.
  • The parameter to this function is a list of blog posts against a certain label, retrieved by the previous function: retrieveAvailableNowPosts().
  • if (!$posts || sizeOf($posts) == 0) { .. }
    • This is more than just a null-check: !$posts will be true if the variable $posts is null, not set (undefined, has no value) or false.
    • Here is a quick overview, showing that an IF is a good test for all three things.
      <?php
         echo '<pre>';
         $foo1;        if($foo1) { ?>foo1 is set and not null/false.<?php } else { ?>foo1 is not set/null/false.<br><?php }
         $foo2=null;   if($foo2) { ?>foo2 is set and not null/false.<?php } else { ?>foo2 is not set/null/false.<br><?php }
         $foo3=false;  if($foo3) { ?>foo3 is set and not null/false.<?php } else { ?>foo3 is not set/null/false.<br><?php }
         echo 'foo1 - ';
         var_dump(isset($foo1));
         echo 'foo2 - ';
         var_dump(isset($foo2));
         echo 'foo3 - ';
         var_dump(isset($foo3));
         echo '</pre>';
      ?>
      
      The output of the above is:
      foo1 is not set/null/false.
      foo2 is not set/null/false.
      foo3 is not set/null/false.
      foo1 - bool(false)
      foo2 - bool(false)
      foo3 - bool(true)
    • This allows us to respond to two sad cases from retrieveAvailableNowPosts() at once.
      • retrieveAvailableNowPosts() returns false if it couldn't read the Atom feed XML from the blog.
      • retrieveAvailableNowPosts() returns null if the list of posts is empty.
      • The sizeOf($posts) == 0 part is actually not needed, because retrieveAvailableNowPosts() returns null if the list of posts is empty, but I left it here in case I ever call this function from a different place and neglect to included the same rule.
  • $dateTime = date("l jS F, Y", strtotime(strtok($post->published, 'T')));
    • $post->published
      • The value of this looks like 2015-03-29T03:10:00.003-07:00.
      • I only want the year, month and date only: I want to discard all the time information and just output the date. I will use a string tokenizer to do this in the next step.
    • strtok($post->published, 'T')
      • I use a string tokenizer with the letter "T" as the token. Note that the first call strtok will return the first token, and since I only need the first token, I don't store reference to the tokenizer. Here is how you would use the tokenizer in another situation to go over all tokens:
        $string = "String to split";
        delimiter = " \n\t";  // Split string on spaces, newlines and tabs.
        $token = strtok($string, $delimiter);
        while ($token !== false) {
            echo "Next token: $token <br />";
            $token = strtok($delimiter);
        }
        
      • The result of strtok($post->published, 'T') will be something like 2015-03-29 (note that it does not include the delimiter itself).
    • date("l jS F, Y", strtotime(strtok($post->published, 'T')))
      • I use the date function to parse the date (from text like 2015-03-29) and output it in a different format (like Sunday 29th March, 2015).
      • See the PHP page for date function to find the full list of date format options, but here is what my format uses.
        • l - A full textual representation of the day of the week: Sunday through Saturday.
        • j - Day of the month without leading zeros: 1 to 31.
        • S - English ordinal suffix for the day of the month, 2 characters: st, nd, rd or th.
        • F - A full textual representation of a month, such as January or March: January through December.
        • Y - A full numeric representation of a year, 4 digits: 1999 or 2003.
  • $link = $post->link[4][href]
    • In a given entry element, get the href attribute of the fifth link element (using a zero based index).
    • The fifth link element holds a direct URL to the post, such as: <link rel="alternate" type="text/html" href="http://chihuahuarescue.blogspot.com/2015/03/mimi.html" title="MIMI"/>.
  • Google's description of what is in each post element shows you what things you can access this way for each post:
    • posts: A list of all posts for this page. Each post contains the following:
      • dateHeader: The date of this post, only present if this is the first post in the list that was posted on this day.
      • id: The numeric post ID.
      • title: The post's title.
      • body: The content of the post.
      • author: The display name of the post author.
      • url: The permalink of this post.
      • timestamp: The post's timestamp. Unlike dateHeader, this exists for every post.
      • labels: The list of the post's labels. Each label contains the following:
        • name: The label text.
        • url: The URL of the page that lists all posts in this blog with this label.
        • isLast: True or false. Whether this label is the last one in the list (useful for placing commas).

Error handling

The primary error condition is from the call to simplexml_load_file, which returns false if there was a failure reading XML from the URL. The secondary error condition occurs if we read the XML okay, but found it contained none of the elements we are interested in. On the page that uses these functions, both error conditions are treated as normal outputs from the retrieveAvailableNowPosts function and dealt with nicely, as you can see below. We output error messages if either error occurs, and display the "normal" content of the page otherwise.

$posts = retrieveAvailableNowPosts();
// If a boolean false is returned there was an error.
if ($posts === false) {
?>
   <p style="text-align: center; color: red;">Unable to load list of Available Now dogs from Chihuahua Rescue Victoria!</p>
<?php
// And null means there is nothing present.
} else if ($posts === null) {
?>
   <p style="text-align: center;">Unfortunately there are no dogs available at this time. Please try again later.</p>
<?php
// Otherwise, all good. Carry on.
} else {
?>
<?php
 ... normal page content goes here.
} // end else
?>

Just die!

We could have handled the error from simplexml_load_file in this way:

$xml = simplexml_load_file($file) or die("<p>An error message.</p>");

This offers a very poor experience, especially on a web page because it will cause PHP to immediately exit and no further code on the page will be processed. This will most likely result in an ugly page with broken HTML.

Resources that helped me.

 

Saturday, April 18, 2015

UltraEdit macro to select HTML/XML tag

In a previous post from 2010, UltraEdit macro to select HTML/XML tag, I detailed two UltraEdit macros to select HTML/XML tags backwards and forwards. It had a couple of problems, such as not being able to distinguish between PRE and P when you start select P tags, so this version fixes that.

Here are the macros. The first is used to select the previous tag: I have it mapped to control+shift+,.

InsertMode
ColumnModeOff
HexOff
UltraEditReOn
Clipboard 2
IfSel
Find RegExp Up Select "</++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}"
Else
Find Up "<"
Find RegExp "[A-Za-z]"
SelectWord
Copy
Find Up "<"
Key LEFT ARROW
Find RegExp Select "</++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}"
EndIf
Clipboard 0

The second is used to select the next tag: I have it mapped to control+shift+..

InsertMode
ColumnModeOff
HexOff
UltraEditReOn
Clipboard 2
IfSel
Find RegExp Select "</++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}"
Else
Find "<"
Find RegExp "[A-Za-z]"
SelectWord
Copy
Find Up "<"
Key LEFT ARROW
Find RegExp Select "</++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}"
EndIf
Clipboard 0

A few notes about the macros.

  • Select previous tag.
    1. Use it by leaving the cursor within an opening tag (<p>) or closing tag (</p> or unary tag (<br>) or within the text content of a tag. Do not select any text.
    2. Press the shortcut (control+shift+,).
    3. The macro will begin running Else part of the IfSel condition (because no text was selected).
      1. Find "<"
        • Looks for the first left angle bracket before the cursor.
      2. Find RegExp "[A-Za-z]"
        • Find the next letter after the left angle bracket - which will be the start of the tag name.
      3. SelectWord
        • Select the tag name.
      4. Copy
        • Copy it - to the second clipboard, which was selected earlier in the macro by the command Clipboard 2.
      5. Find Up "<"
        • Select the first left angle bracket before the cursor (again).
      6. Key LEFT ARROW
        • Make sure cursor is to the left of that angle bracket so the next command (a Find) will have that character in scope.
      7. Find RegExp Select "</++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}"
        1. Select the entire open/close/unary tag.
        2. Find - because we had previously moved to the left of the opening left angle bracket of the tag, the search will take this tag into account.
        3. RegExp - use regular expressions. An earlier macro command (UltraEditReOn) specified that UltraEdit regular expressions are turned on (as opposed to Perl or Unix ones).
        4. Select - whatever we find with the next expression will be selected in UltraEdit.
        5. A breakdown of the expression: </++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}
          1. </++
            • Find left angle bracket and zero or more forward slashes: matches < or </.
          2. ^c
            • Find text in clipboard 2 (which we selected previously).
          3. ^{>^}^{[ ^p^r^n^t]+[~>]++>^}
            • This is an OR expression. ^{A^}^{B^} says find A or B. So this expression says to find either one of:
              • >
                • The right angle bracket that closes a tag. This covers the simple cases, e.g. <p>.
              • [ ^p^r^n^t]+[~>]++>
                • [ ^p^r^n^t]+ one or zero of: space or newline (DOS, Mac or Unix) or tab.
                • [~>]++ zero or more of any character other than the right angle bracket.
                • > the right angle bracket.
                • This covers tags with attributes, e.g. <p style=""> which may or may not be spread across multiple lines.
    4. Run the macro again with shortcut (control+shift+,).
    5. The macro will run the IfSel condition because now there is text selected from the previous run.
    6. It will run the exact same Find as was described above except for one difference.
      1. Find RegExp Up Select "</++^c^{>^}^{[ ^p^r^n^t]+[~>]++>^}"
      2. The Up part means that we will look for the next complete tag to the left of what we already have selected from the previous run.

Here is a sample of HTML that I used to test this on.

<html>
   <head>
      <title>Some Title</title>
   </head>
   <body>
      <div>
         <div style="color: red;"
               id="divWithId">Nested <span>div</span>.
            <pre>
               monospaced
            </pre>
         </div>
      </div>
   </body>
</html>

For the first run, I place the cursor as indicated below by | (either within the DIV open tag or within the DIV content.

<html>
   <head>
      <title>Some Title</title>
   </head>
   <body>
      <div>
         <div style="color: red;"
               id="divWit|hId">Nested| <span>div</span>.
            <pre>
               monospaced
            </pre>
         </div>
      </div>
   </body>
</html>

Run the select previous tag macro (I have it mapped to control+shift+,) and text will be selected as indicated below.

<html>
   <head>
      <title>Some Title</title>
   </head>
   <body>
      <div>
         <div style="color: red;"
            id="divWithId">Nested <span>div</span>.
            <pre>
               monospaced
            </pre>
         </div>
      </div>
   </body>
</html>

Now run the select next tag macro (I have it mapped to control+shift+.) and text will be selected as indicated below.

<html>
   <head>
      <title>Some Title</title>
   </head>
   <body>
      <div>
         <div style="color: red;"
            id="divWithId">Nested <span>div</span>.
            <pre>
               monospaced
            </pre>
         </div>
      </div>
   </body>
</html>

Run the select previous tag macro again and text will be selected as indicated below.

<html>
   <head>
      <title>Some Title</title>
   </head>
   <body>
      <div>
         <div style="color: red;"
            id="divWithId">Nested <span>div</span>.
            <pre>
               monospaced
            </pre>
         </div>
      </div>
   </body>
</html>

Run the select next tag macro again and text will be selected as indicated below.

<html>
   <head>
      <title>Some Title</title>
   </head>
   <body>
      <div>
         <div style="color: red;"
            id="divWithId">Nested <span>div</span>.
            <pre>
               monospaced
            </pre>
         </div>
      </div>
   </body>
</html>

If you use either macro again at this point, nothing will happen because there are no more DIV elements in the document not already selected.

Importantly, these macros work correctly on similarly named tags such as zip zipfileset (which I have used in XML for Ant build files). If I am select zip tags, it skips nested zipfileset elements.

Two final notes.

  1. Something this macro cannot do is to select an entire element that contains nested elements of the same tag. For example, consider the HTML below.
    <div id="outer">
       <div id="inner">
          Inner DIV.
       </div>
    </div>
    
    Macros in UltraEdit cannot be used to select the entire outer DIV because you can't store state in a macro, which you would need to do in order to count nested elements to make sure you select the entire outer one. My workaround for this situation is to just make it easier to keep selecting next/previous DIV tag so that you can achieve the same effect with a bit of repetition.
  2. UltraEdit Find commands in macros can use Perl regular expressions, which are very powerful too. One thing they can do much more easily is to treat newline characters as part of the wildcard. In a Perl regex, (?s) tells the regular expression to include newline characters when matching a . wildcard. You can also use backreferences in Find and Replace expressions. However, backreferences don't persist between macro calls. So, these two macros store the tag name in clipboard 2 so that each time you call one of the macros afterwards, it "remembers" what tag you were searching for by looking at clipboard 2.

    I wrote this up on the UltraEdit forum here: macro to select HTML tag v2.

    Tuesday, April 14, 2015

    Add content to Blogger posts specific to a label

    Mum uses her Blogger to post about dogs she has rescued and uses labels to mark which rescued Chihuahuas that are available to good homes, or have been happily re-homed etc. I created an expression of interest form for people to apply for the dogs that are available, and I wanted a link to the form to appear below the blog entries labelled with "Available now" only. Here is how to do it.

    You need to edit your Blogger XML template by going to the dashboard, selecting your blog, going to the Template section and clicking on the "Edit HTML" button. Take a backup of this content before you change it so that you can always go back! I haven't found a single comprehensive reference on the XML grammar and really only worked this out by experimentation and reading examples.

    In the template, search for <data:post.body/>. If you use the template for your mobile version as well, you will see two instances of that tag. I put my content under both instances because I wanted the same thing to appear on the desktop and mobile versions.

    <div class='post-body entry-content' expr:id='"post-body-" + data:post.id' itemprop='articleBody'>
       <data:post.body/>
       <!-- If the post has labels. -->
       <b:if cond='data:post.labels'>
          <!-- Go through all the labels attached to the post. -->
          <b:loop values='data:post.labels' var='label'>
             <!-- If current label is our target one. -->
             <b:if cond='data:label.name == "Available now"'>
                <!-- Display content I want to appear after the post. -->
                <div style="text-align: center;">
                   <p><em>If you would like to adopt this dog, or any of our Chihuahua Rescue Victoria dogs, please fill out the <a href="http://www.chihuahuarescuevictoria.org/forms/adoption/Express-interest-in-adopting-from-Chihuahua-Rescue-Victoria.php">expression of interest form</a>.</em></p>
                </div>
             </b:if>
          </b:loop>
       </b:if>
       <div style='clear: both;'/> <!-- clear for photos floats -->
    </div>
    

    Elements to note here.

    • <data:post.body/>
      • Indicates the body of each blog post.
    • <b:if cond='data:post.labels'>
      • Tests if the blog post has labels attached to it.
    • <b:loop values='data:post.labels' var='label'>
      • Loop through every label attached to this single post. The loop will place the current label in a variable named "label".
    • <b:if cond='data:label.name == "Available now"'>
      • Test if the current label's name is the label I am targeting.
      • I place my label-specific content within this tag.

    So essentially what I am doing is looping through post.labels with each label in a variable called label, and for each one I examine label.name.

    Look carefully at Google's description of what is in each post element to see what else you can access for each post:

    • posts: A list of all posts for this page. Each post contains the following:
      • dateHeader: The date of this post, only present if this is the first post in the list that was posted on this day.
      • id: The numeric post ID.
      • title: The post's title.
      • body: The content of the post.
      • author: The display name of the post author.
      • url: The permalink of this post.
      • timestamp: The post's timestamp. Unlike dateHeader, this exists for every post.
      • labels: The list of the post's labels. Each label contains the following:
        • name: The label text.
        • url: The URL of the page that lists all posts in this blog with this label.
        • isLast: True or false. Whether this label is the last one in the list (useful for placing commas).

    References I found useful.

     

     

    Sunday, April 12, 2015

    Validation and error handling points

    I have been writing a form today to send off emails with a PHP back-end. All day I have been dealing with validation and trying to cover all the bases (so all your bases remain belonging to YOU).

    Things to keep in mind when validating data.

    1. Presence - if the field is mandatory, validate that it has a value.
      1. Always give some visual way of marking a field as mandatory.
    2. Size - check the size of the input i.e. that it is less than a certain number of characters.
      1. Do this for all values, whether they are coming from an input, text area, checkbox or radio button. One way a site can be attacked is for a malicious sender to ignore your HTML form and POST their own massive values.
    3. Escape and sanitise data before you use it to prevent cross site scripting or SQL injection attacks. Do this before you:
      1. Save to a DB.
      2. Output back to HMTL.
      3. Write to a file.
      4. Send to an email.
      5. Send it to another part of your back-end for further processing.
    4. Type - check that an input is an integer or double or boolean as required.
    5. Format - check that input matches a required format, like a phone number or email etc.
      1. Can be helped by using input masks, but you still need to validate that the data you received matches the mask on the server side.
    6. Value - check input against your own business logic. For example:
      1. Is a number within a given range.
      2. Does the string match an element in a known list of choices.
    7. Related - apply any validation that requires examining multiple fields. For example, if country, state and postcode are given, make sure that they are a valid combination.
    8. Server side first.
      1. Client side validation is often easier, but server side validation is more important because javascript can be disabled or ignored completely if a malicious sender simply POSTS their own requests.
      2. Consider how to return errors in a such a way that they can be easily communicated back to the user on the interface.
    9. Client side second
      1. While server side is more important, client side validation makes for a faster and more responsive user experience because you can point out errors before the user ever hits SEND.
      2. Consider things such as how to present errors to users and how mark things like dynamic business rules (where field A is only mandatory if field B is given a value).

    Wednesday, April 08, 2015

    Lambdas are not instance methods

    Here is a mistake I made. I had assumed that since a lambda becomes an instance of the functional interface I thought that the lambda would be turned into an instance method.

    Runnable run = () -> System.out.println("This class: [" + this.getClass().getName() + "].");
    

    No. I get the following compilation error: Cannot use this in a static content.

    In response to the StackOverflow question What is a Java 8 Lambda Expression Compiled to? I found that Sotirios Delimanolis gives a very useful answer, referring to part of the Java 8 Language Specification - 15.27.4. Run-Time Evaluation of Lambda Expressions. He noted that the JLS doesn't say anything about how the code should be compiled, so it is up to the compiler creator to decide whether the lambda's body should be a static or instance method.

    Monday, April 06, 2015

    Functional interfaces in JDK 8

    After my previous post on Lambdas, I decided to have a closer look at what makes a functional interface.

    What is a functional interface?

    1. A functional interface is an interface that has one abstract method.
      1. Functional interfaces used to be called Single Abstract Method (SAM) interfaces.
    2. A functional interface must be defined as an interface type, not an annotation type, enum, or class.
    3. A functional interface can optionally be annotated with the @FunctionalInterface annotation.
      1. This annotation is not necessary for Lambdas as the compiler will figure out if an interface is functional or not (by seeing if it has only one abstract method). The annotation is useful when you are writing a functional interface, because the compiler will generate an error if the interface you are writing does not meet the requirements of being a functional interface.

    There are a number of conditions around what counts as the single abstract method in a functional interface.

    1. JDK 8 allows you to add concrete methods to interfaces by using the default keyword. These default methods do not count.
    2. JDK 8 allows you to add static methods to interfaces. These do not count.
    3. Interfaces can also re-declare (override) methods from java.lang.Object and while these methods are also abstract in an interface, they don't count either because any implementation of the interface will automatically inherit those methods from Object. A good example of this is Comparator, which re-declares the equals method so that it can put special notes about it in the javadocs (see: why does comparator declare equals?).
    4. An advanced case involves a functional interface that extends from multiple interfaces that include override-equivalent methods (methods that have the same signature after type erasure). See below.

    Functional interface as target type for a lambda

    Functional interfaces are much more powerful in JDK 8, where they are used as the target type for lambdas. Whenever you create a lambda in JDK 8:

    1. The compiler will figure out what functional interface fits the parameter list and return type for that lambda.
    2. It will create the lambda as an instance of that functional interface (an object whose type is that of the interface).
    3. The code within the lambda will be used as the concrete implementation of the sole abstract method.

    An example of a functional interface being assigned to a lamba is below. First we have a simple POJO representing a book.

    public final class Book {
       private final String title;
       private final String author;
    
       public Book(final String theAuthor, final String theTitle) {
          title = theTitle;
          author = theAuthor;
       }
    
       public String getAuthor() { return author; }
    
       public String getTitle() { return title; }
    
       @Override
       public String toString() { return title + " by " + author; }
    }
    

    Then we have a class that will create a few Book instances and add them to a list. After that, the code uses two lambdas to sort and then print books. The last two lines invoke methods on ArrayList that accept functional interface type parameters. We use lambdas to provide them.

    public final class FunctionalInterfaceTest {
       public static void main(String[] args) {
          List<Book> books = new ArrayList<Book>();
          books.add(new Book("Stephen King", "The Shining"));
          books.add(new Book("Bram Stoker", "Dracula"));
          books.add(new Book("Thomas Harris", "The Silence of the Lambs"));
          books.add(new Book("Henry James", "The Turn of the Screw"));
          books.add(new Book("David Wong", "John Dies at the End"));
          books.add(new Book("Ryu Murakami", "Piercing"));
          books.add(new Book("Peter Straub", "Ghost Story"));
          // Sort books by title.
          books.sort((book1, book2) -> book1.getTitle().compareTo(book2.getTitle()));
          // Print books in their new order.
          books.forEach((book) -> System.out.println(book));
       }
    }
    

    The output of this code is:

    Dracula by Bram Stoker
    Ghost Story by Peter Straub
    John Dies at the End by David Wong
    Piercing by Ryu Murakami
    The Shining by Stephen King
    The Silence of the Lambs by Thomas Harris
    The Turn of the Screw by Henry James
    

    Here is line that sorts the books.

    // Sort books by title.
    books.sort((book1, book2) -> book1.getTitle().compareTo(book2.getTitle()));
    

    The argument to ArrayList's sort method must be a Comparator, which is a functional interface. It's sole abstract method is compare(T o1, T o2) where o1 and o2 are not specified but must be the same type; also, the method returns an int. This is what the compiler expects us to provide as a parameter and the lambda we have written will fit that type.

    Looking a bit closer into the lambda: (book1, book2) -> book1.getTitle().compareTo(book2.getTitle()).

    1. (book1, book2)
      1. The parameter list contains two arguments whose type we do not specify.
      2. The compiler will infer them as being two Book objects because we are calling sort on a list of books (List<Book>).
    2. book1.getTitle().compareTo(book2.getTitle())
      1. The code to execute in the lambda is an expression - a piece of code that will evaluate to a single value. The value in this case is an int because compareTo on String returns an int.
      2. This matches the sole abstract method in Comparator, which is compareTo. So the compiler will consider this lambda to be an instance of the functional interface Comparator.
      3. Somewhere in the sort code, compareTo will be called on our lambda, which is now a Comparator object.

    Here is line that prints the books.

    // Print books in their new order.
    books.forEach((book) -> System.out.println(book));
    

    The argument to ArrayList's forEach method must be a Consumer, which is a functional interface. It's sole abstract method is accept(T t) where t is some non-specific type and the method has a void return type. This is what the compiler expects us to provide as a parameter and the lambda we have written will fit that type.

    Looking a bit closer into the lambda: (book) -> System.out.println(book).

    1. (book)
      1. The parameter list contains one argument whose type we do not specify.
      2. The compiler will infer that it is a Book object because we are calling forEach on a list of books (List<Book>).
    2. System.out.println(book)
      1. The code to execute in the lambda is a single statement, with a void return type. Since we don't use the return keyword, compiler figures out that nothing is being returned and thus the lambda code has a void return type.
      2. This matches the sole abstract method in Consumer, which is accept. So the compiler will consider this lambda to be an instance of the functional interface Consumer.
      3. Somewhere in the forEach code, accept will be called on our lambda, which is now a Consumer object

    Functional interfaces and override-equivalent methods

    With respect to functional interfaces and lambdas, this is a corner case. However, I am going into it in further detail here because it reveals much about the implications of type erasure that came with Generics in JDK 5. The question: what happens to a functional interface that extends from multiple interfaces that contain override-equivalent methods i.e. methods that have the same signature after type erasure?

    First, a look at type erasure, which occurs when generic type information is removed when the compiler generates a class file from source Java. This is done so that code which uses generics will still be compatible with pre-JDK 5 code that doesn't use generics. Practically speaking, it means that you cannot have two methods like this in the same class:

    public void foo(List bar) { }
    public void foo(List<String> bar) { }
    

    This code will not compile because after type erasure, they would have exactly the same signature:

    public void foo(List bar) { }
    public void foo(List bar) { }
    

    The above methods are override-equivalent: their signatures are the same after type erasure. JLS (Java Language Specification), Chapter 8. Classes - 8.4.2. Method Signature says that two methods are override-equivalent if they have the same signature (name and parameter list) or if they have the same signature after type erasure.

    While you can't put two override-equivalent methods in a single class, you can legally end up inheriting from multiple interfaces that contain override-equivalent methods. The result will be a method that can legally override all the inherited abstract methods (after type erasure). The example below shows what happens when an interface extends other interfaces (functional interfaces in this case because they have only one abstract method each) whose sole methods are all override-equivalent: in fact, two of them are exactly the same before type erasure.

    interface Foo1 { void bar(List<String> arg); }
    interface Foo2 { void bar(List<String> arg); }
    interface Foo3 { void bar(List arg); }
    @FunctionalInterface interface Foo extends Foo1, Foo2, Foo3 {}
    public class OverrideEquivalent implements Foo {
      // This compiles.
      @Override public void bar(List arg) { }
      // Does not compile if we use this one instead.
      // @Override public void bar(List<String> arg) { }
    }
    

    The example above shows that a method without generics can legally override generic methods that will have the same signature after type erasure, or non-generic methods that are the same signature. If you use Foo as a functional interface, the method you end up overriding with will be the one that can override all the others i.e. it will have all the generic types erased.

    References about type erasure:

    1. JSL, Chapter 4. Types, Values, and Variables - 4.6. Type Erasure.
    2. The Java Tutorials - Type Erasure.
    3. This Stack Overflow post: Java generics - type erasure - when and what happens.
      1. It features this answer by WChargin which explains how code that uses generics like this:
        List<String> list = new ArrayList<String>();
        list.add("Hi");
        
        is compiled into the same code but with generic type information removed:
        List list = new ArrayList();
        list.add("Hi");
        
        It also points out that there is still metadata in the class file about generics, but it is not accessible to code that uses the class file: they are converted into compile-time checks and runtime casts.
    4. Another answer on the same question shows a way to get around type erasure with anonymous classes, which is elaborated on further here:
      1. Super Type Tokens.
      2. Using TypeTokens to retrieve generic parameters.

    Saturday, April 04, 2015

    Introduction to Lambdas in JDK 8

    We write lambdas essentially as a block of code and a parameter list. The JVM will make an object out of those for us - an object whose target type that will be a functional interface. It's similar to creating an anonymous class, but removes a lot of the boiler plate: it's syntactic sugar for creating instances of functional interfaces.

    Syntax of a Lambda

    Below is an example lambda being assigned to an instance of Runnable.

    Runnable run = () -> System.out.println("Lambra assigned to a Runnable.");
    

    From left to right:

    1. Runnable run =
      • The lambda is being assigned to an instance of Runnable. More on that later.
    2. ()
      • This is the lambda's parameter list (empty here). Parentheses may or may not be required. Types may or may not be required. For example:
        • x
          • Parentheses not needed because we have only parameter.
          • Parameter type not needed because it is inferred from the functional interface.
        • (double x)
          • Need the parentheses because we included the type.
        • ()
          • Parentheses required because we have no parameters!
        • (x, y)
          • Parentheses required because we have more than one parameter.
          • Again, parameter types are inferred from the functional interface.
    3. ->
      • What follows this is the lambda code itself.
    4. { ... }
      • Actual code to be executed when the lambda is run. Could be any of these:
        • An anonymous code block (zero or more statements enclosed in curly braces). For example:
          • Empty block: { }
          • Just one statement:
            Runnable run = () -> {
               System.out.println("Hello World!");
            };
            
          • Multiple statements:
            Runnable run = () -> {
               System.out.println("Hello World!");
               System.out.println("Hello World!");
            };
            
        • A single statement (curly braces optional).
          Runnable run = () -> {
             System.out.println("Hello World!");
          };
          
          or
          Runnable run = () -> System.out.println("Hello World!");
          
        • An expression (no curly braces).
          BinaryCalculator division = (v1, v2) -> v1 / v2;
          

    Target type is a functional interface

    Lambdas are objects in Java, but we do not have to explicitly define their type (like we do with an anonymous class). Instead, Java will try to match a lambda to a target type. The target type of a lambda is a functional interface. A functional interface is an interface that has only one abstract method defined within it (JDK 8 now allows interfaces to contain static and default methods, but these don't count here). Optionally, a functional interface may be marked with the @FunctionalInterface annotation. For example: Runnable or Callable in JDK 8 are both annotated with @FunctionalInterface.

    Lambdas can be passed directly to constructors or methods and the compiler will automagically work out which functional interface to use as a compatible type for the lambda. The compiler will take the parameter list of the lambda and the return type of the lambda code and look for a functional interface whose single method matches it. Consider this example:

    Runnable run = () -> System.out.println("Hello World!");
    

    The parameter list of the lambda is empty - (). The return type of the code block is null - System.out.println("Hello World!"). Plus, we are assigning this lambda to an instance of a Runnable, whose sole method (run) accepts no paramaters and has a void return type, so this works.

    The interface does not have to marked with the @FunctionalInterface annotation though. This also works.

    System.out.println("bbb compared to aaa: "
          + compareStrings((value) -> "bbb".compareTo(value), "aaa"));
    // ..
    static int compareStrings(final Comparable comparator, final String value1) {
       return comparator.compareTo(value1);
    }
    

    The lambda (value) -> "bbb".compareTo(value), "aaa") is the first parameter to the compareStrings method, whose type is a Comparable. The only method to that interface is compareTo - which accepts some type T and returns an int. This matches the lambda - which accepts an object of any type (T is unspecified, so it just has to be some type) and it returns an int. Therefore, the lambda can be assigned to a Comparable instance and the Comparable interface is not marked with the @FunctionalInterface annotation.

    Using lambdas

    Here is an example of using lambdas, which I have adapted from this brilliant JavaWorld article: The essential Java language features tour, Part 6 - Getting started with lambdas and functional interfaces.

    public final class LambdaTest {
    
       public static void main(final String[] args) {
          final BinaryCalculator addition = (double v1, double v2) -> {
             return v1 + v2;
          };
          final BinaryCalculator division = (v1, v2) -> v1 / v2;
          final UnaryCalculator negation = v -> -v;
          final UnaryCalculator square = (double v) -> v * v;
          final double value1 = 18;
          final double value2 = 36.5;
          System.out.printf("%2.1f + %2.1f = %10.3f%n", value1, value2,
                calculate(addition, value1, value2));
    
          System.out.printf("%2.1f / %2.1f = %10.3f%n", value1, value2,
                calculate(division, value1, value2));
    
          System.out.printf("%2.1f / %2.1f = %10.3f%n", value1, value2,
                calculate(negation, value1));
    
          System.out.printf("%2.1f / %2.1f = %10.3f%n", value1, value2,
                calculate(square, value1));
    
       }
    
       static double calculate(final BinaryCalculator calculator,
             final double value1, final double value2) {
          return calculator.calculate(value1, value2);
       }
    
       static double calculate(final UnaryCalculator calculator,
             final double value) {
          return calculator.calculate(value);
       }
    
       @FunctionalInterface
       interface BinaryCalculator {
          double calculate(double value1, double value2);
       }
    
       @FunctionalInterface
       interface UnaryCalculator {
          double calculate(double value);
       }
    
    }

    A little bit of discussion on these follows.

    final BinaryCalculator addition = (double v1, double v2) -> {
       return v1 + v2;
    };
    

    The example above needs parenthesis around the parameters because there are more than one. Parameter types are specified. The body of the lambda could be just an expression, but has been turned into a statement with the return keyword and a semi-colon.

    final BinaryCalculator division = (v1, v2) -> v1 / v2;
    

    The example above needs parenthesis around the parameters because there are more than one. Parameter types are left out, because they can be inferred. The lambda is being assigned to an instance of BinaryCalculator: a functional interface whose sole method accepts two parameters of type double, so the JVM can infer the parameter types for the lambda. The body of the lambda here is just an expression (so no curly braces and no semi-colon).

    Note: an expression is something that evaluates to a single value. A statement forms a complete unit of execution that ends with a semi-colon. A block is a group of zero or more statements between balanced braces and can be used anywhere a single statement is allowed.

    final UnaryCalculator negation = v -> -v;
    

    The example above doesn't need parenthesis around the parameters because there is only one. The lambda is being assigned to an instance of UnaryCalculator: a functional interface whose sole method accepts one parameter of type double, so the JVM can infer the parameter type for the lambda. The body of the lambda here is just an expression (so no curly braces and no semi-colon).

    final UnaryCalculator square = (double v) -> v * v;
    

    The above example shows the same things as the one above it except for one thing: even with just one parameter, you can still define the type and surround it with parenthesis.

    Type inference is powerful

    This deserves a little further explanation. Type inference is syntactic sugar that means we can write shorter code, leaving out a lot of boilerplate code because the compiler will figure out types without us having to explicitly declare them: that's interface type and parameter types. In the example above, I directly assign lambdas to instance variables i.e. the type of the lambda is explicitly declared. Note that the parameter types are still being inferred.

    final BinaryCalculator division = (v1, v2) -> v1 / v2;
    

    Now let's look at an example where the lambda is sent directly as an argument, without being declared as a variable first.

    System.out.printf("%2.1f + %2.1f = %10.3f%n", value1, value2,
            calculate((v1, v2) -> v1 / v2, value1, value2));
    

    Nothing in the code explicitly says what type the lambda or parameters are and they do not match any local or instance variables. Here we are forcing the compiler to first work out what the target type of the lambda is, and then it has to figure out what parameter types are.

    1. What is the target type? The compiler has to find a functional interface whose sole abstract method matches the parameter list and return type of the lambda, but the compiler won't know the parameter types straight away.
      1. The biggest clue that the compiler can take is by looking at what we are sending the lambda to. We are calling a method called calculate with three parameters. That matches the version of calculate whose first parameter is a BinaryCalculator.
      2. Another clue is that the lambda accepts two parameters.
      3. Maybe the compiler even looks at the body and sees a divide operation that can only return double. (Not sure about this though.)
      4. Then the compiler should look through all the functional interfaces it knows about until it finds one that matches all these conditions:
        1. The functional interface type is assignable to the parameter of the method that the lambda is being sent to.
        2. The functional interface's sole abstract method has a parameter list that matches what it knows about the parameters being sent to the lambda.
        3. The functional interface's sole abstract method has a return type that matches what is being returned by the lambda code.
    2. So it picks BinaryCalculator as the functional interface - it has only one abstract method.
    3. BinaryCalculator's sole abstract method is calculate(double value1, double value2), which accepts two double parameters, so it knows what types to give the parameters too.