Welcome to Py4Science at the Vienna Biocenter

The Python language is an excellent tool for scientific computing and rapidly growing in popularity. It is easy to read, easy to write and it features a great selection of powerful libraries.

Py4Science VBC (Vienna Bio Center) is a monthly meeting series facilitating cross-pollination between interested parties who use Python as part of their scientific tool set. Those wishing to attend do not need to be expert at python programming, but familiarity with the language and some popular libraries such as numpy and matplotlib is encouraged. Even if you are a complete beginner the talks will be short (15 minutes) followed by interactive discussions and demonstrations so you will not be bored.

Google Groups
Subscribe to py4science VBC
Visit this group
Check out our repositiory at github.com/strawlab/py4science-vbc/

A virtual machine featuring all the necessary software will be provided. You can download this from here (coming soon).

The GMI Orange Seminar Room (room 9.36)
The 1st Friday of the month, 12:30pm.

Next Talk

Bioimage analysis with ilastik

Ullrich Koethe

Friday, 7 February 2014, 12:30pm, IMP Lecture Hall

Tools for bioimage analysis presently face two major challenges: (i) they should work reliably in the hands of end users, and (ii) should be able to handle massive amounts of data. To solve the first problem, our group’s major open-source project “ilastik” (www.ilastik.org) offers generic image analysis workflows (pixel and object classification, interactive segmentation, and tracking) in up to five dimensions (space, time and spectral) which can be adapted to new experiments using modern machine learning methods. ilastik’s intuitive user interface and immediate feedback on all interactions enable biologists to train these methods themselves, without consulting an image analyis expert. The second problem is addressed by means of an execution graph architecture called “lazyflow” which determines the minimum computation required to fulfill any user request and executes each request strictly on demand and in parallel. The talk will give an introduction to the underlying algorithms and software design as well as a short online demonstration of the software’s capabilities.

Previous Talks

Python and Google App Engine and Google Drive and Google Calendar and Google Spreadsheets

Andreas Poehlmann

Friday, 8 November 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

Having a cross operating system compatible interface for your favourite python tools that is reachable from anywhere with an internet connection can be easily achieved using the services provided by google app engine. This friday we’ll go through some minimal working examples, to get you started with app engine, and I’ll show you a tool that I wrote for flystock management, which uses google app engine, google drive and google calendar.

If you already have some special questions regarding app engine, let me know in advance!

Using Python for fun and profit in machine learning competitions

Santi Villalba

Friday, 6 September 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

Machine learning competitions are becoming a mainstream sport, with many skilled data analysts competing to have fun, show-off their skills, earn some money or just improve themselves. In this talk I will introduce crowsourced gamification for solving predictive analytics problems, praise the excellences of python as a language for data analysis and, time allowing, show a hands-on example of a “python competition framework”, enriched with machine learning algorithms coming from a java machine learning library, but without a single line of java hurting your eyes.

Yapsy - a lightweight plugin framework for Python.

Rudolf Hoefler, IMBA

Friday, 9 August 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

For software projects it is often convenient to implement certain functionality as plugins (e.g. ImageJ) to avoid re-installation of the whole package. Yapsy (Yet Another Plugin System) provides such a framework for Python. Developers focused on simplicity i.e. no dependencies except the standard library. It provides various classes for plugin management that can be easily extended.

Pandas for practical, real world data analysis in Python

Andreas Poehlmann

Friday, 7 June 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data: - Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet - Ordered and unordered (not necessarily fixed-frequency) time series data. - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels - Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

I’ll give you an overview over the available data structures, the different file backends and some of the wonderful magical powers that pandas posseses. See you then!

Complex Experiment Configuration, Control, Automation, and Analysis using Robot Operating System (ROS)

John Stowers

Friday, 3 May 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

The Robot Operating System (ROS), and its Python bindings, are well known and used in the engineering and robotics communities for the many high level tools and algorithms they provide. Less appreciated are the lower levels of the ROS stack; libraries for inter-process-communication, parameter and configuration management, and distributed process launching and control.

In the Straw laboratory we use ROS to automate the operation of, and experiments using, virtual reality systems for fixed and freely flying Drosophila. This includes real-time 10-camera tracking (100Hz), 5 projector panoramic virtual reality (120Hz), and real-time visual stimulus generation and control (80Hz). Operation of this system requires the launching of over 30 processes on 4 computers, and the associated configuration of each in a known state. In addition, the progress of the experiment must be monitored over its entire 12 hour duration.

In this talk we will describe how ROS makes this complex system manageable and reproducible by implicitly recording the state of the system at all times, and by automating the pre-configuration and launching of the multiple processes which control the experiment. I will also describe how we tag all experimental data with unique identifiers to facilitate live monitoring, post-experiment analysis, and long time archival in case later forensics are required.

This talk will show that ROS is a very powerful tool and should not only be considered for engineering and robotics applications; but by any scientist for robustly and reproducibly managing complex scientific experiments.

(This will be an abbreviated practice version of the same talk to be presented at SciPy2013 conference in June)

CellCognition: A tool for time-resolved phenotype annotation in high-throughput live cell imaging

Christoph Sommer, IMBA

Friday, 5 April 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

Automated microscopy has become an enabling technology to monitor and quantify properties of cells. A typical workflow to analyze microscopy data comprises the segmentation of cells, linking of cell objects over time, and quantification of cellular phenotypes. CellCognition is published as open source software, enabling automated analysis of live-cell image-based screening data.

C# for Computer Graphics and hence for Scientific Computing

Robert Tobler, VRVis Research Center

Friday, 1 March 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

Although C# does not have such a dedicated community for scientific computing as Python, its modern language features allow for a high level of code reuse and extend some of the features found in Python. I will show some examples from Computer Graphics, which show how the combination of generic and functional programming with a strong type system can benefit the impementation of algorithms for scientific computing.

Since 2000, Dr. Tobler is a senior research at VRVis Research Center for Virtual Reality and Visualization in Vienna, Austria. His professional interests include real-time and photorealistic rendering, efficient global illumination algorithms, and procedurally generated geometry and generalized subdivision. His web page is http://rftobler.at/Personal/.

A Bayesian t test using PyMC

Andrew Straw, IMP

Friday, 1 February 2013, 12:30pm, GMI Orange Seminar Room (room 9.36)

Across many branches of biology, null hypothesis significance testing is the tried-and-true method of establishing whether an effect is “real”. Nevertheless, many pitfalls must be avoided to correctly evaluate statistical significance and it is easy to make mistakes that render the analysis invalid. I will summarize a recent paper Bayesian estimation supersedes the t test (“BEST”) by John Kruschke (2012, Journal of Experimental Psychology: General.). Kruschke’s Bayesian approach purports to acheive the same goals as the t test with fewer potential pitfalls. Furthermore, his approach has several advantages, such as the ability to accept the null hypothesis. Computationally, credible intervals of important parameters such population means and effect size are found using Monte-Carlo techniques such as the Metropolis-Hastings algorithm. I will discuss a Python implementation of the BEST algorithm written using PyMC.

Python-powered voice analysis

Christian Herbst, University of Vienna

Friday, 7 December 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

profiling: why is my script running so slow?

Uemit Seren, Gregor Mendel Institute

Friday, 28 September 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

Profiling is an important but often neglected aspect in programming, especially in the area of scientific computing. I will show various tools that can be used to identify performance hotspots in your code. We will use timeit to measure the execution time, move on to cProfile and pstats for detailed profiling. Based on the results of cProfile I will show how to interpret the results visually and finally demonstrate line_profiler.py for retrieving line by line profiling information.

Scientific GUIs on Linux using Gtk

John Stowers, IMP

Friday, 14 September 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

Presentation and source code here

Python + Gtk tutorial here

Comparison of plotting tools in Gtk

Interfacing with C-code, bindings for free

We will look at using the GTK GUI toolkit from Python with special reference to two challenges faced when doing scientific programming: plotting and interfacing between Python and C. We’ll run examples that will work on a modern Ubuntu Linux install, and as the session will be hands-on, you are invited to follow along.

matplotlib: tips for getting from quick plots to publication-quality figures

Andrew Straw, IMP

Friday, 24 August 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

matplotlib is great for making plots of your data in a few lines of code. I’ll share with you some tips I’ve developed over the years about how to efficiently go from those few lines into what is needed later in the publication cycle. This includes simplifying what is drawn, saving plots with mixed vector and raster data, and post-drawing composition and editing with Inkscape. The talk is based off our lab styleguide.

branches, merging and rebasing: more git (hands-on: bring your laptop)

Andrew Straw, IMP

Friday, 13 July 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

Link to notes

As a followup to our first meeting on git, we’re now going to move to the skills required to share your changes with a larger group of people working on the same software. This is a beginner-intermediate level talk. (If you’re already a git guru, you will not likely learn much, but your attendance would be very welcomed by others in the room who will likely apppreciate your expertise.) Bring your laptop with git pre-installed or with access to a server running git. At this hands-on py4science VBC, we will actively make changes to an online repository and submit them back.

Fiji, a user and developer friendly framework for image processing

Wolfgang Busch, Gregor Mendel Institute

Friday, 22 June 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

It is a serious effort to provide experimentalists with usable open source software for computational analysis of their image data. A crossplatform package that combines the advantage of open source, high grade of usability and very broad application range, is Fiji. With Fiji, it is possible to provide packages to biologists that are very easy to install, as well as to maintain using Git version control features. I will demonstrate the usage of Fiji and the broad possibilities to create packages for it. http://fiji.sc/

get control of your source code with git (hands-on: bring your laptop)

Andrew Straw, IMP

Friday, 25 May 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

presentation and source code here

As scientists, it is critical to maintain accurate records of exactly what software we run when we perform a particular experiment or analysis. The standard way to do this is with revision control software which keeps track of what changed and when within a repository of files. In the past several years, a new class of such software has emerged, dramatically reducing the workload to set up a new source code repository, transfer version controlled files from one computer to another, and to collaborate with others. Git is amongst the best of these new solutions, and our lab is using git and an online service (github.com) to facilitate these actions to great effect. Bring your laptop, because at this py4science VBC, we will walk you through installing git, cloning a repository from github, making changes to the repository, and pushing your changes back.

short introduction to opencv in python

Andreas Poehlmann, IMP

Friday, 11 May 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision. It has C++, C, Python and soon Java interfaces running on Windows, Linux, Android and Mac. The library has >2500 optimized algorithms. I’ll give an introduction to the python opencv interface and show in a few examples how easy image processing can be nowadays.

data persistence in python

Uemit Seren, Gregor Mendel Institute

Friday, 27 April 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

Persisting data is typically done in a RDBMS (relational database management systems). However programmers work with domain models to represent the same data in their applications. ORM layers like SQLAlchemy help to overcome the impedance mismatch between these two systems and deal with implementation differences of the underlying data storage. Not every kind of data fits into a RDBMS however; bulk data/big data are not suitable for RDBMS and have to be stored in a different way (PyTables, HDF5,). I will provide some examples for both storage systems and also try to show how to combine them.

interfacing with native code from python

John Stowers, IMP

Friday, 13 April 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

presentation here

source code here

There are many ways to integrate with native code (i.e. C/C++ libraries) from python; ctypes, cython/pyrex, swig, native python modules, etc. Each method has different trade-offs in terms of performance, maintainability and extensibility. I will discus these trade-offs and give strategies for wrapping object orientated and performance critical native code.

structured arrays in numpy

Andrew Straw, IMP

Friday, 23 March 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

presentation here

source code here

Numpy’s structured arrays are a hybrid between a bare numpy array (just a bunch of numbers) and table in which each column has a name and particular data type. As a data container, they allow self-documenting massive amounts of data and are becoming increasingly used as an interchange format for reading and writing CSV files, HDF5 files, and performing R-like DataFrame analysis (with the Pandas library).

15 short python programs in 15 minutes

John Stowers and Andreas Pöhlmann, IMP

Friday, 9 March 2012, 12:30pm, GMI Orange Seminar Room (room 9.36)

announcement flyer PDF

presentation here

source code here

A whirldwind tour of the capabilities of python, the python standard library, and common python modules including numpy, matplotlib and opencv. We will show you how to read and write common files, perform analysis of images, audio, and numerical data, plot the results, talk to lab equipment, and much more.

Future Talks

  • NumPy for MATLAB programmers
  • The Zen of Python
  • Interfacing to native code from Python