Introduction to the Linux perf tool

Av Marcus Folkesson den 5 September 2015

I’ve been using perf for a few years now but I’m still often surprised at how useful it is.

For the uninitiated, perf is a profiling tool that comes with the Linux kernel. Unlike other profiling tools like gprof or callgrind, perf is profiling the entire system, which mean both kernel and userspace code. One other big advantage is that the functionality is fully integrated into the kernel so no daemons or what so ever is needed (as for example oprofile needs).

Perf is used with several subcommands, these are:

  • stat: measure total event count for single program or for system for some time
  • top: top-like dynamic view of hottest functions
  • record: measure and save sampling data for single program
  • report: analyze file generated by perf record; can generate flat, or graph profile
  • annotate: annotate sources or assembly
  • sched: tracing/measuring of scheduler actions and latencies
  • list: list available events
  • Get started

    The first step is to compile perf. Navigate to your Linux kernel source and move down to tools/perf/. Compile with make, the system will automatically detect your configuration and build in support for those libs that are available. I recommend that you have libelf on your system. Without libelf, perf is not able to resolv symbols for your application.

    Compile your application

    We will profile the application below.

    void c()
    volatile int x = 0;
    volatile int z = 0;

    for(x = 0; x < 1000000; x++) {
    z = x + x;

    void b()
    volatile int x = 0;
    for(x = 0; x < 100; x++) {

    void a()

    int main(int argc, const char *argv[])
    return 0;

    The flags used when compiling the application is -g and -fno-omit-frame-pointer. The former is for debug symbols and the latter to make it possible to make call graphs.

    gcc -g -fno-omit-frame-pointer ./main.c -o app

    Record a run

    Start the application with perf:
    perf record -g ./app

    Perf may also be attached to an existing process with -p PID (may be combined with -t TID for a specific thread).

    This will result in a file containing the profiling data.

    Examine the result

    By default, a Linux system does not allow any user resolve kernel addresses. Change the permissions by write 0 to /proc/sys/kernel/kptr_restrict. See man proc for more details.
    echo 0 | sudo tee  /proc/sys/kernel/kptr_restrict

    The most simple way to examine the result is to run perf report. Report will start a nice ncurses interface.



    The interface will let you see the CPU usage for each function for both kernel and userspace. It is also possible to expand each function to see the call sequence. As we see in the image above, c() is taking 99.77% of the CPU time.

    It is also possible to annotate with source code. Here we can see that the instruction that compares x with 1000000 is the most time consuming operation. (I guess the next instruction (jle, jump if  lesser or equal) spoils the instruction pipeline and is therefor the *real* bottleneck)



    Going further

    This was just the most common scenario where perf may be really useful. Perf has a lot more features such as trig on cache-misses, hardware events, software events, support for dynamic tracing and so on. See …/tools/perf/Documentation for further reading.

    ASP.NET 5

    Av Mats Sjövall den 30 Augusti 2015

    Några regniga dagar under sommaren så roade jag mig med att testa ASP.NET 5 (beta6). Projektet jag gjorde var en enkel bloggmotor. På det stora hela kan jag konstatera att det är ett mycket trevligt ramverk. Det tog inte speciellt lång tid att bygga upp den funktionalitet jag ville ha och prestandan blev över förväntan. Det enda som jag upplevde mest som ”beta” var tooling-stödet i Visual Studio. En hel del buggigt beteende och mycket man fick göra via kommando-raden.

    Några av teknikerna jag använde mig av var:

    • Azure AD Authentication
    • SQL Server
    • Entity Framework
    • Bootstrap
    • Bower
    • Gulp

    Det hela hostas i Microsoft Azure.

    Några observationer:

    • Nytt projektfilformat, xproj där man inte behöver lista filerna som ingår utan den kompilerar det som ligger i katalogerna.
    • Ny json-baserad nuget-hantering där man slipper lägga in dll-referencer i projektfilen utan dll:er från NuGet-paketen refereras automatiskt.
    • Entity Framework manageras nu via .Net Execution Environment istället för som tidigare via Nuget Powershell.
    • Roslyn analyzers funkar inte än :(
    • Jag fick många gånger göra manuell restore av packages för att få saker att bygga.
    • Unittest ramverket som stöds är enbart xunit
    • Azure-autentiseringen fungerade inte med CoreCLR (Open Source varianten)
    • Hosting i IISExpress får den att krasha med ”Access Violation”, dnx funkar dock bra.

    Beta 7 släpps i dagarna nu och där har man fokuserat på att få klart cross-platform stödet, så min plan här framöver är att hosta min blogg på en Linux-maskin istället för i IIS som nu.

    Resultatet av mitt hackande kan beskådas på

    YES!! We are going, are you?

    Av Johan Deimert den 20 Augusti 2015

    As the title says, Linux Development Center will be going to Dublin in October to  attend Embedded Linux Conference Europe.

    What can be expected? A lot of embedded Linux experience that hopefully can be applied to our own project and to feed new thoughts on how things could be done.

    As there are a number of parallel conferences on Linux there are a lot of seminars to choose from, reading from the schedule there are a large variety of fields. How ever, after a small glance there seems to be a overweight that has to be with connectivity. Also some robotics and drone stuff, which were a large part of the Embedded Linux Conference held this march in San Francisco. This was called ”Drones, Things and Automobiles” which points in the direction that the Linux dists are going.

    See you there!

    Eudyptula challenge

    Av Marcus Folkesson den 30 Juli 2015

    For those who never heard about the Eudyptula Challenge, it is a series of programming tasks for the Linux kernel. The format of these exercises is inspired from the crypto-challenges that has been around for a while.

    Before we go any further we need to sort out one thing that I know you are thinking about, why chose a name that is so hard to spell to and impossible to pronounce as Eudyptula?
    When signing up for this challenge, all communication is done by mail with Little, who is ”a set of convoluted shell scripts that are slow to anger and impossible to debug”. Eudyptula is latin for the smallest species of penguins, and there is the connection.

    To join the challenge, all you need is to send an email to Little and tell that you want to participate. This requires that you have a sane email-client, which surprisingly few have. Nasty things such as HTML in emails are totally forbidden, noone wants it anyway, and it will be rejected from all kernel mailing lists.
    I personally use Mutt for all my emailing.

    There are currently 20 tasks starts from a very basic ”Hello world” kernel module and moving up in complexity. A couple of tasks is to get patches accepted to the main Linux kernel source tree which involves some ”real work”, interact with people on the mailing list and get to know the development procedures for the Linux kernel.

    But what do you need to know to take the challenge?
    A basic understanding of the C programming language is required, that’s all.

    I have been doing Linux kernel development for quite some time but I have learnt a lot. Mostly because I have not had reason to touch areas such as the implementation of the FAT filesystem, get my modules to load automatically when I plug in an USB keyboard or write a netfilter hook to search all incoming TCPv4 packets for a specific ID.
    As you see, the tasks is touching many parts of the Linux kernel. The tasks also cover really useful parts such as using DebugFS, create sysfs entries, make character devices and so on.

    One side effect is that you will be good at using Git (I really do love git..) since you will be using it a lot. But one of the biggest earnings of taking this challenge is that you will be really good at navigating in the Linux kernel source code, which is necessary when doing kernel development. The concepts in the kernel stands, but the implementation vary with versions and so will the internal API, so you have to lookup code all the time. Fast navigation will facilitate for sure.

    Personally I use VIM as editor with plugins for cscope/ctags/global for fast navigation and it works like a charm, really.
    A tips is to use ”git grep” for searching within the project, it is much faster than a recursive ”grep” since it only searching in files which is part of the repository (not searching auto-generated crap).

    I hope you find this challenge interesting and you give it a chance. The time is well spent if you ask me, and it will take time. The challenge will require a lot of studying and reading source code, but let it take the time it needs, this is no race.

    See ya on the lkml!

    PS: Here is just a quote from a random mail in my inbox… :-)

    Very nice job, you are now done!
    If you were curious, you are the 102nd person to complete the series of
    tasks, with over 11000 people currently attempting it.

    ldd without ldd

    Av Marcus Folkesson den 14 Juni 2015

    I sometimes meet colleges at work who gets frustrated when they try to print the shared libraries dependencies for an ELF or library, and the ldd command is simply stripped out from target. (I do often strip targets :-) )

    As if that would be a big problem.

    The ldd command is not a binary executable, but a script that simple calls the runtime dynamic linker with a few environment variables set, and you may do the same!
    The essential environment variable in this case is LD_TRACE_LOADED_OBJECTS, that should be set to something != 0.

    In short, you may do:
    LD_TRACE_LOADED_OBJECTS=1 /lib/ ./my_application.

    Even the –list option may be used, but does not work on all targets.

    /lib/ --list ./my_application.

    Example outputs with ldd:

    [11:16:58]marcus@tuxie:/tmp/a$ ldd ./main =>  (0x00007fffc8bff000) => /lib/x86_64-linux-gnu/ (0x00007f564e254000) /lib64/ (0x00007f564e647000)

    Example output without ldd:

    [11:17:23]marcus@tuxie:/tmp/a$ LD_TRACE_LOADED_OBJECTS=1 /lib64/ ./main =>  (0x00007fffeecbc000) => /lib/x86_64-linux-gnu/ (0x00007fcce7700000) /lib64/ (0x00007fcce7af3000)

    So, do not get frustrated, be happy.