Friday, August 31, 2012

Python development with Sublime Text 2 tips and tricks

For quite some time I have used Eclipse + PyDev for Python development. Pydev makes definitively Eclipse a good IDE for Python, it has a great debugger and build system, the autocompletion works well, it is mature and read out of the box.

I decided to change searching for a more lightweight IDE and because I became simply addicted to the huge editing capabilities of Sublime Text 2, like selection management or editing several regions at the same time. The outline on the right is simply great... basically I find sublime extremely productive.

Now besides the (relatively not) interesting reasons why I started using sublime for python development, the goal of this post is to explain what and how to install to ease python development providing sublime things like python autocompletion, lint support, pdb support, etc, etc....

Python developmnt with Sublime Text alone

The first nice surprise with Sublime is that it is already a somewhat reasonable python IDE (without autocompletion). We already have syntax highlighting (quite common indeed), and a flexible build system.

The build system (discussed in bigger detail later) allows to un simple python scripts with no configuration at all just choosing python build system from Tools menu and build. It is easy to configure a your proper build system as discussed later on.

One of the most interesting out of the box features related with python is the presence of code templates as Sublime commands, thus reachable from the menu with cmd+shift+P.
By the way, the amount of languages Sublime provides snippets for is huge, just check Preferences - Browse Packages menu.

Now let's see what can we install as packages to provide python autocompletion, debugger support etc. Just to start with, a good way to install packages is through Sublime Package Control (downloadable here: http://wbond.net/sublime_packages/package_control) This provides a full package manager and helps discoverin, installing and updating packages.

Python autocompletion

For python autocompletion I installed a Sublime adaptation of Rope library: SublimeRope (https://github.com/JulianEberius/SublimeRope).

In terms of autocompletion, this package plugs into default Sublime default completion system adding python symbols at the default ones.

The easiest way to install it is through package control, Shift-Cmd-P,  choose Package Control: Install Package command and select SublimeRope. The installation won't require Sublime Text to be restarted.

Once Rope installed, each python file will profit of autocompletion. As it is at this stage, autocompletion will search for symbols declared in the file you are editing, all default python symbols and all installed modules and python symbols accessible from PYTHONPATH.

Still Rope won't be able to find symbols declared in separate files of the same project. To fix this issue, a Rope project needs to be created explicitly. This can be done through Rope: New Project command from the command palette.

Refactor and source management

Every reasonable IDE provides some kind of support for refactoring and some abilities to navigate through the sources of a project. SublimeRope provides these capabilities as well.

Once you install SublimeRope you have three commands for source navigation:
  • Rope: Go  to definition. Which opens the definition of the symbol under the cursor
  • Rope: Show Documentaiton. Shows python documentation for the symbol under the cursor:
  • Rope: Jump to Global.
Show Documentation output
SublimeRope provides an access through Sublime command palette to most of refactoring provided by Rope library like method extraction, inlining or extraction of variables, etc.

These functionalities come without a dedicated set of key bindings. These can be easily added modifying Sublime keybindings through Preferences - User Keybindings menu.
For example this maps "show documentation" on cmd + R +D:
 //Rope key bindings
    { "keys": ["super+r", "super+d"], "command": "python_get_documentation", "context":
        [
            { "key": "selector", "operator": "equal", "operand": "source.python" }
        ]
    }


Python lint

There are several python linters, for pyflakes and pep8 there is a Sublime package (SublimeLint) that highlights potential issues in your code while editing. It can be installed from Package manager while source and documentation can be found here: https://github.com/SublimeLinter/SublimeLinter.
Issues are highlighted directly in the code while a description is present on the status bar.

Sometimes for (more or less) good reasons we want to disable some of linter checks (personally I found the limitation to 86 characters to max line width a bit too strict). This can be done with sublimeLint by editing Sublimelinter.sublime-settings (Preferences - Package Settings - SublimeLinter - Settings - User menu). Pay attention always to modify user settings and not default ones which will be overwritten at every update.

Here are my settings:
{
    //I don't want to be bothered by issues on spaces around oeprators
    "pep8_ignore":
    [
        "E251","E501","W191","E303"
    ]
}

Debugging

What I miss most about Eclipse + PyDev is the great Eclipse debugger. Which is perfectly integrated with pdb  through PyDev.

Still pdb is not that hard to use alone, what is needed is a proper integration in Sublime Text. I found a good one in Sublime REPL (https://github.com/wuub/SublimeREPL) which allows running a lot of interactive interpreters into a Sublime Text buffer.

These interpreters include pdb (a command is ready out of the box to run the current python file with pdb - REPL: PDB current file command):
At this point you are running a classical pdb session.

A couple of words about build systems

For simple scripts, out of the box build systems are great as it is the scrupt ran through REPL. What happens instead when we want to provide input parameters to our script or change environemnt variables?

For running the scripts Subilme Text allows you to configure your own build system in a very flexible way, documented here: http://sublime-text-unofficial-documentation.readthedocs.org/en/latest/reference/build_systems.html.

I find useful adding the root of the project to PYTHONPATH in order to allow test scripts access the modules of my project no matter if they are in different directories and they are not yet installed in my python distribution.

To do so I created my build systems inside my project configuration instead of making them available to the whole IDE: in PROJECTNAME.sublime-project:

"build_systems":
    [
        {
            "cmd": ["python","${file}","command1"],
            "env": {"PYTHONPATH":"/Users/pyppo/Documents/workspacepython/pyCmdLiner/"},
            "name": "pythonProjectTestBase"
        }
    ]

creates a build system called pythonProjectTestBase that invokes python on the currently open file, passing a command argument (command1) and adding project root directory to PYTHONPATH before running the project.

Things are a bit more complex when debugging with REPL. I am not aware as of today of a project specific way to provide REPL configuration.

As a workaround I provided my PYTHONPATH as REPL user configuration in sublimeREPL.sublime-settings:

{
    "default_extend_env": {"PYTHONPATH":"/Users/pyppo/Documents/workspacepython/pyCmdLiner/"}
}

This is accessible from Preferences - Package settings - SublimeREPL - Settings User menu.

For additional flexibility it is always possible to modify SublimeREPL command definition or to create additional ones. This configuration is accessible from Preferences - Browse Packages - SublimeREPL menu, under python directory:
  • Default.sublime-commands lists the commands and points to the definition file
  • Main.sublime-menu contains commands definition.
Run python script with pdb is defined by default in this way:
{"command": "repl_open",
                     "caption": "Python - PDB current file",
                     "id": "repl_python_pdb",
                     "mnemonic": "d",
                     "args": {
                        "type": "subprocess",
                        "encoding": "utf8",
                        "cmd": ["python", "-i", "-u", "-m", "pdb", "$file_basename"],
                        "cwd": "$file_path",
                        "syntax": "Packages/Python/Python.tmLanguage",
                        "external_id": "python"
                        }
                    }

I never tried to provide a user based command configuration for REPL instead of tweaking the default one, so I cannot say if it is possible or not.

Conclusion

Here are my 2 cents on how to set up a python development environment with Sublime Text. I am pretty sure there are plenty of other useful packages to be installed to have a better IDE, this is just a starting point .....

Wednesday, May 23, 2012

Scala Profiling talk at Riviera Scala Clojure Group

Yesterday night I gave a talk about scala profiling at Riviera Scala Closure Group in Sophia Antipolis (http://www.meetup.com/riviera-scala-clojure/).
Great Experience !

Here my 25 readers can find the slides:
Scala profiling presentation

Stay tuned for more details

Saturday, May 12, 2012

Scala profiling - part 1 - data structures

This is the first post of a non determined length list about performance profiling of Scala applications. Now most people will say that Scala application are ran in a JVM and compiled to Java byte code, thus the same techniques for Java profiling should be applied. This is mostly true, in fact the same profilers, and the same heap dump analyzers can be safely used for Scala applications, still there are some topics which are specific to Scala applications like:
  • Data structures to be searched for in a heap dump (we are not using the comfortable java.utul.List and so on...)
  • Objects found in the heap dumps for clojures and so on...
  • How do Scala control structure appear in a  profiler
  • How do I configure garbage collection to optimize Scala programs
This post deals with the first topic: given a heap dump, here you will find how to search for Scala Lists, Tuples, Sets, Maps and how to inspect the content.

Taking a heap dump and opening it

This is the first step for  inspecting the heap of a Scala application, and it is not at all different than for Java applications (here the description concerns Oracle JVMs, for other VMs equivalent procedures exist).
You can take it either through a memory analyzer like Eclipse MAT (http://eclipse.org/mat/ (my favourite)) or Visual VM which is included in every Oracle JDK since Java6. Alternatively since Java6 there is the command line tool jmap that allows you to take a dump.
In all these cases you have to know the pid of the Scala application. With jmap you have first to execute jps, which lists java processes with pid on a given machine.

Let's see an example:
after starting the Scala program:
new-host:~ pyppo$ jps
12764 Jps
12763 ExplodingPermGen
12414
12378
our pid is 12763, so we can get a heap dump through
jmap -dump:live,format=b,file=heap.bin 12763

If you prefer a visual approach, both MAT and visualvm provide you the list of running java processes when you try to acquire the dump (File->Acquire Heap Dump on MAT).

Now we have a file (normally with hprof extension) that we can open with MAT, visualvm or most of Java profilers.

Scala lists and arrays in a heap dump

Now that we have opened our heap dump we may be interested in searching lists, arrays and list wrappers there inside.
Here is a screenshot of our opened heap dump which shows all local variables of our main: <Java local> marked elements referenced by the main Thread object.

We can first search for arrays. This is easy. Arrays in Scala are pure Java arrays, thus no difference in the way we search for them than java. Here we can see an empty String array (java.lang.String[0]).
Something more interesting are immutable lists, build like this:
 var listOut = List(1,2,3)
This kind of lists are immutable linked lists and the corresponding java class (the one we have to search for in the dump) is: scala.collection.immutable.$colon$colon. This represent the head of the list. It contains two attributes: tl of the same type which is the pointer to the next element and scala$collection$immutable$$colon$colon$$hd which is the reference to the contained element.

Since the list is immutable, the :: and ::: Scala operators to create a new list adding an element to an existing list and to concatenate lists are very fast. No copy of elements between lists is done. Adding an element in front simply create a new list with the new element as head and the reference to the previous list as tail. This also means that if the objects in the list are mutable,  modifying the object will touch both the initial list and the newly built one.

A mutable flavour of Lists in Scala is ListBuffer, These instances can be found as scala.collection.mutable.ListBuffer, This wraps an immutable list defined through the previous data structure, but it keeps also a reference to the tail of the list through last0 attribute. This allows to attach new elements to the end of the list, making it mutable.

Maps and sets

Both Sets and Maps in Scala have two different kind of internal structure depending on the number of elements they contain (for immutable version). Under 5 elements the class implementing the Map/Set contains an attribute per element. While over this limit the underlying data structure is an array.
Let's first consider immutable Sets and Maps unders five elements. They are straightforward in structure and they are respectively managed through inner classes of scala.collection.immutable.Map and scala.collection.immutable.Set. For maps we have $Map1 to $Map5. Same for Sets. Both these data structures contain a list of attributes called valueN for maps and elemN for Sets.

For bigger Sets we have instances of scala.collection.immutable.HashSet$HashTrieSet. This contain an array of scala.collection.immutable.HashSet$HashSet1 each of which contains a direct reference to the element of the set and an integer representing the hash.

For mutable Sets and Maps there is no difference in the data structure between big and small collecitons. They are represented respectively through scala.collection.mutable.HashSet and scala.collection.mutable.HashMap. In both the cases the underlying data structure is an array. The difference is the content of the array: for the HashSet, the elements in the array are directly the elements provided by the user, while for the HashMap, these are wrapped scala.collection.mutable.DefaultEntry, each of which contains the key, the value and a reference to the next element in the same position in case of hash collision.

Tuples

Tuples have a straightforward structure, which reminds the one of the sets of small number of elements. They are implemented through scala.Tuple[CARDINALITY] where CARDINALITY is the size of the tuple. Inside they contain a private final attribute per element. These attrubte names are _NUM where NUM is the sequential number of the element inside the tuple.


.... stay tuned for the next data structures ....

Tuesday, April 24, 2012

On Weblogic session replication messages

This post discusses two common error messages you may encounter when managing Weblogic clusters with in memory replication: BEA-100089 and BEA-000117 The goal is to try to explain what to do (if anything has to be done) when you encounter them.

BEA-100089 The session id: sessionID has been accessed from currentServer, a server that is neither the primary (primaryServer) nor the secondary (secondaryServer). The request URL was: requestUrl

As the message says, a request for a given sessionId has been routed to a node of the cluster which was not hosting neither the primary copy of the session nor the secondary one (if using in memory replication, each HTTP session is created and served on a node of the cluster, and at the end of each request it is replicated on a secondary node).

The first question is: if I see this message should I think that the load balancer is routing requests to the wrong node? 
Not necessarily.
Despite in case of bad routing this message is likely to appear, there are a lot of cases where it does not point out any specific issue.
Let's say the primary node for a session is shutdown. At the following request, two behavior can be seen. If the load balancer sends the requet to the node that was the former secondary, no message will be shown. This node already has a copy of the session, thus it will serve it and will just elect a new secondary at the end of the transaciton.
This is not always the case, in fact the load balancer (at least this is true for wlproxy plugin) upon failure of the primary node, intentionally chooses randomly the new node instead of choosing the former secondary. This happens to avoid to keep the amount of sessions well balanced redistributing the sessions previously held by the failing node.

How can the node know who was the primary and who was the secondary?
This is feasible since the session id contains the JVM id of both primary and secondary in this form:
<SESSIONID>!<PRIMARY_JVM>!<SECONDARY_JVM>!<SESSION_CREATION_TIMESTAMP>
Thus each node receiving a request with a session id is able to know if it is supposed to know the session and who are primary and secondary.

Here is a real life example of this message to understand what we can extract from it:
<Apr 23, 2012 11:13:34 PM CEST>
<Warning>
<HTTP Session>
<BEA-100089>
<The session ID:
CJ1nPVGDlQCk2prqZ3hrChTQGZwXvpQG3FmnxLxC6TYZDlKJmVF7
has been accessed from
810226440766491326S::myCluster:serv1,
a server that is neither the primary
(2703743310001936216S:192.168.1.13:[7004,7004,-1,-1,-1,-1,-1]:myCluster:serv2)
nor the secondary
(8742509417973544740S:192.168.1.13:[7005,7005,-1,-1,-1,-1,-1]:myCluster:serv3).
The request URL was: http://192.168.1.13:7003/clusteredWebapp/UselessServlet>
  • 810226440766491326S::myCluster:serv1 is the identifier of the server that received the request: id::cluster name::server name
  • Primary and secondary are expressed in this form: 2703743310001936216S:192.168.1.13:[7004,7004,-1,-1,-1,-1,-1]:myCluster:serv2. Including server id, ip, port, replication port, cluster and server name.

BEA-000117 Received a stale replication request for object id.

This message literally means that a node (the one where the message is logged) received a copy of an object for replication from another node but the node receiveing the copy already had a copy of the same object of the same session.

Most of the time this happens in case there is a real issue on the node sending the replica, so that the primary take an insanely long time to send the replica to the secondary. Meanwhile a new request is served and the load balancer ignores the former primary. In this case the replica of the second transaciton can get to the secondary node before the replica of the first.

There is anyway a quite common case where this message can appear (together with BEA-000089) and such that no real issue is present. Let's imagine we have a cluster with three servers: serv1, serv2 and serv3.
We have our session being served by serv1 and such that serv2 is the secondary node. Now, for any reason, the load balancer decides to make the session failover on serv3 and serv3 elects serv2 as secondary.
After a while for some reason the session is failed over again to serv1 (which still had an old version of the session). Since serv1 already has a copy of the sesison it won't try to get the up to date content of the session from the former secondary. It will anyway try to elect a secondary node and send the stale session there during replication.
If the elected secondary already had a copy of the session (serv2 for example), this will trigger an occurrence of this message even if there is no real infrastructure issue.
This message is not only a warning. If it appears it means that replication failed and that the secondary has not been elected. This can be verified through weblogic console  on monitoring section in cluster.

Sunday, March 25, 2012

Developing content assistant for Eclipse plugins (part 1)

This small tutorial is aimed in explaining how to implement the infrastructure for a Content Assistant system in eclipse plugins or, more generically, for JFace applications.
In the following we will distinguish between two cases:
  • Eclipse plugin (or RCP applications) editor. In this case we want to attach Content Assistant to an Editor, which implements org.eclipse.ui.IEditorPart
  • Attaching a Content Assistant to a standard text control (which must anyway be an org.eclipse.jface.text.source.SourceViewer. This case can be found in a standalone application or in any eclipse plugin where content  assistant is not provided to an editor
We will see that Content Assistant is provided in the same way, but the infrastructure providing the proposals is attached to the text control in a different way.
Content Assistant is managed through three different subjects which will be the topic of the next three sections.

IContentAssistantProcessor
The main component of a content assistant is the implementation of IContentAssistantProcessor. Qhen this is attached to an editor or to a SourceViewer SWT component, it is invoked when content assistant is requested, it finds the context and provides the content of the menu.

The method to be implemented to provide the list of suggestions is computeCompletionProposal. What has to be returned is an array of ICompletionProposal objects. ComputeCompletionProposal receives two parameters: a TextViewer which is the SWT control the content assistant is attached to and and integer which represents cursor's position on the control.
The TextViewer is needed to extract the context to provide the list of items. As an example, the context could represent the prefix of a class name which is used to filter the content of the content assistant.
A CompletionProposal object includes 4 elements which can be passed through its constructor:
  • replacementString is the String to be shown and to be written back into the TextViewer
  • replacementOffset is the destination position in the TextViewer where the chosen element will be put
  • replacementLength is the number of already present characters to be substituted (let's say the user wrote java. and that he chooses java.util.List from the content assistant, replacementLength is 5 (java.))
  • cursorPosition is the position of the cursor after writng the content of the proposal.
Example:
public class OQLContentAssistantProcessor implements IContentAssistProcessor
{

    /**
     * provides suggestions given the context
     */
    private SuggestionProvider suggestionProvider;

    /**
     * Extracts the context from TextViewer
     */
    private ContextExtractor extractor;

   .....


    public ICompletionProposal[] computeCompletionProposals(ITextViewer arg0, int arg1)
    {
        String context = extractor.getPrefix(arg0, arg1);
        List suggestions = 
         suggestionProvider.getSuggestions(context);

        return buildResult(suggestions, arg1, context.length());

    }

   
    public char[] getCompletionProposalAutoActivationCharacters()
    {
        return new char[] { '.', '"' };
    }

    
    private ICompletionProposal[] buildResult(List suggestions, int currentCursor,
                    int replaceLength)
    {
        if (suggestions == null)
            throw new IllegalArgumentException("Cannot produce a suggestion. List is null");

        ICompletionProposal[] retProposals = new ICompletionProposal[suggestions.size()];
        Iterator it = suggestions.iterator();

        int c = 0;
        while (it.hasNext())
        {
            ContentAssistElement cp = it.next();
            String classname = cp.getClassName();
            ICompletionProposal completion = new CompletionProposal(classname, currentCursor - replaceLength,
                            replaceLength, currentCursor - replaceLength + classname.length());
            retProposals[c] = completion;
            c++;
        }

        return retProposals;
    }
}
In this example, computeCompletionProposal uses two helpers: the first (ContextExtractor) extract the prefix of the string to be searched, while the second (SuggestionProvider)takes this and computes the list of Strings. Once this is done, the list of suggestions is converted into a list of CompletionProposal objects through buildResult method.

SourceViewerConfiguration
In the last section we saw what provides the content for a content assistant system. Now it is time to attach our ContentAssistantProcessor to a source viewer, which can be an Editor in Eclipse plugins or a SourceViewer SWT components in any jface application.

In both the cases of the Eclipse plugin or the stand alone application, the ContentAssistantProcessor is provided by an instance of SourceViewerConfiguration, which besides the ContentAssistantProcessor provides managers for document formatter, text partitioner and so on. (see here: http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.platform.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjface%2Ftext%2Fsource%2FSourceViewerConfiguration.html   for details about SourceViewerConfiguration class.

Typically a plugin would subclass it extending only the methods of interest like:
public class OQLTextViewerConfiguration extends SourceViewerConfiguration
{
.....

@Override
    public IContentAssistant getContentAssistant(ISourceViewer sourceViewer)
    {
        ContentAssistant cAssist = new ContentAssistant();

.......
        OQLContentAssistantProcessor fromProcessor = new OQLContentAssistantProcessor(classSuggestions, classNameExtr);
         
        cAssist.setContentAssistProcessor(fromProcessor, IDocument.DEFAULT_CONTENT_TYPE);

        cAssist.enableAutoActivation(true);

        cAssist.setAutoActivationDelay(500);
        cAssist.setProposalPopupOrientation(IContentAssistant.CONTEXT_INFO_BELOW);
        cAssist.setContextInformationPopupOrientation(IContentAssistant.CONTEXT_INFO_BELOW);

        return cAssist;
    }

getContentAssistant provides an instance of ContentAssistant, which wraps the ContentAssistantProcessor discussed above. The interesting reason why a ContentAssistantProcessor is wrapped into a ContentAssistant is that it is possible to associate several ContentAssistantProcessors to the same SWT control providing different information in different contexts (for example in an SQL content assistant, the set of proposals will be different between SELECT clause and FROM clause, thus they may be served by different ContentAssistantProcessors).

This is managed through setContentAssistProcessor which takes a second parameter (here IDocument.DEFAULT_CONTENT_TYPE) which identifies the context where the processor has to be used. 

Defining the contexts will be subject of the second part of this tutorial. 

Now that we have a SourceViewerConfiguration, we have to attach it to the text control we want to provide content assistant. In case we are developing an editor for an eclipse plugin, each Editor class has a method setSourceViewerConfiguration that allows to attach the SourceViewerConfiguration. This method is protected and can be invoked by editor constructor. 
In case of stand alone jface application, or simply when we want to attach content assistant to a control which is not a jface editor, the editor must provide a method to attach a SourceViewerConfiguration (configure(SourceViewerConfiguration) method for org.eclipse.jface.text.source.ISourceViewer).

Key binding
In most of the cases you will want the content assistant not only to be activated by a specific character typed by the user (like the '.' in Java editors) but also by a key combination (like CTRL+space). 


If you are writing an Eclipse plugin and you are attaching a content assistant to an Editor, this comes for free, in that your editor comes with an action that shows the content assistant and this action is bound to a key binding (CTRL+space) that can be modified by the used through Eclipse key bindings management. 


This is not the case in a stand alone JFace application or when adding the content assistant to a JFace control which is not an Editor.

To better understand this concept let's see how this key binding work and how do they trigger content assistant:


Start the Content Assistant: Operation can be requested programmatically invoking doOperation method on the SourceViewer control (org.eclipse.jface.text.source.SourceViewer) passing ISourceViewer.CONTENTASSIST_PROPOSALS as parameter.


Providing an Action that start the Content Assistant: Actions implement a Command pattern allowing to define a behavior of your application independently from the logic that triggers it. In this case we have a simple action that opens the content assistant:
contentAssistAction = new Action()
        {
            @Override
            public void run()
            {
                queryViewer.doOperation(ISourceViewer.CONTENTASSIST_PROPOSALS);
            }
        };


Defining a Command for content assistant: Commands associate an Id to a user behavior. They are kept separated from Actions in order to be able to associate different actions on the same user behavior in different plugins. 

Commands are defined into plugin.xml, they do not define a semantics of the action that has to be performed (it is the action that is then attached to a command). They just define an id for a user behavior:

 
      
Defining a key binding: This allows to attach a key binding to a command. This is performed in plugin.xml as well.


Previous four steps  are not necessary for eclipse plugins editors in that they are already defined. It is only necessary to provide the correct SourceViewerConfiguration.