How to write a genfromtxt with f2py?

203 views Asked by At

I've found that the function genfromtxt from numpy in python is very slow.

Therefore I decided to wrap a module with f2py to read my data. The data is a matrix.

subroutine genfromtxt(filename, nx, ny, a)
implicit none
    character(100):: filename
    real, dimension(ny,nx) :: a 
    integer :: row, col, ny, nx
    !f2py character(100), intent(in) ::filename
    !f2py integer, intent(in) :: nx
    !f2py integer, intent(in) :: ny
    !f2py real, intent(out), dimension(nx,ny) :: a

    !Opening file
    open(5, file=filename)

    !read data again
    do row = 1, ny
        read(5,*) (a(row,col), col =1,nx) !reading line by line 
    end do
    close (5)
end subroutine genfromtxt

The length of the filename is fixed to 100 because if f2py can't deal with dynamic sizes. The code works for sizes shorter than 100, otherwise the code in python crashes.

This is called in python as:

import Fmodules as modules
w_map=modules.genfromtxt(filename,100, 50)

How can I do this dynamically without passing nx, ny as parameters nor fixing the filename length to 100?

2

There are 2 answers

5
tmdavison On

I think you can just use:

open(5, file=trim(filename))

to deal with filenames shorter than the length of filename. (i.e. you could make filename much longer than it needs to be and just trim it here)

I don't know of any nice clean way to remove the need to pass nx and ny to the fortran subroutine. Perhaps if you can determine the size and shape of the data file programatically (e.g. read the first line to find nx, call some function or have a first pass over the file to determine the number of lines in the file), then you could allocate your a array after finding those values. That would slow everything down though, so may be counter productive

0
DavidW On

The filename length issue is easily dealt with: you make filename: character(len_filename):: filename, have len_filename as a parameter of the fortran function, and use the f2py intent(hide) to hide it from your Python interface (see the code below).

Your issue of not wanting to have to pass in nx and ny is more complicated, and is essentially is the problem of how to return an allocatable array from f2py. I can see two methods, both of which require a (small and very light) Python wrapper to give you a nice interface.

I've haven't actually addressed the problem of how to read the csv file - I've just generated and returned some dummy data. I'm assuming you have a good implementation for that.

Method 1: module level allocatable arrays

This seems to be a fairly standard method - I found at least one newsgroup post recommending this. You essentially have an allocatable global variable, initialise it to the right size, and write to it.

module mod
    real, allocatable, dimension(:,:) :: genfromtxt_output
contains
    subroutine genfromtxt_v1(filename, len_filename)
    implicit none
        character(len_filename), intent(in):: filename
        integer, intent(in) :: len_filename
        !f2py intent(hide) :: len_filename
        integer :: row, col

        ! for the sake of a quick demo, assume 5*6
        ! and make it all 2
        if (allocated(genfromtxt_output)) deallocate(genfromtxt_output)
        allocate(genfromtxt_output(1:5,1:6))

        do row = 1,5
           do col = 1,6
             genfromtxt_output(row,col) = 2
           end do
        end do        

    end subroutine genfromtxt_v1
end module mod

The short Python wrapper then looks like this:

import _genfromtxt

def genfromtxt_v1(filename):
    _genfromtxt.mod.genfromtxt_v1(filename)
    return _genfromtxt.mod.genfromtxt_output.copy()
    # copy is needed, otherwise subsequent calls overwrite the data

The main issue with this is that it won't be thread safe: if two Python threads call genfromtxt_v1 at very similar times the data could get overwritten before you've had a change to copy it out.

Method 2: Python callback function that saves the data

This is slightly more involved and a horrendous hack of my own invention. You pass your array to a callback function which then saves it. The Fortran code looks like:

subroutine genfromtxt_v2(filename,len_filename,callable)
implicit none
    character(len_filename), intent(in):: filename
    integer, intent(in) :: len_filename
    !f2py intent(hide) :: len_filename
    external callable
    real, allocatable, dimension(:,:) :: result
    integer :: row, col
    integer :: rows,cols

    ! for the sake of a quick demo, assume 5*6
    ! and make it all 2
    rows = 5
    cols = 6
    allocate(result(1:rows,1:cols))
    do row = 1,rows
        do col = 1,cols
            result(row,col) = 2
        end do
    end do        

    call callable(result,rows,cols)

    deallocate(result)
end subroutine genfromtxt_v2

You then need to generate the "signature file" with f2py -m _genfromtxt -h _genfromtxt.pyf genfromtxt.f90 (assuming genfromtxt.f90 is the file with your Fortran code). You then modify the "user routines block" to clarify the signature of your callback:

python module genfromtxt_v2__user__routines 
    interface genfromtxt_v2_user_interface 
        subroutine callable(result,rows,cols) 
            real, dimension(rows,cols) :: result
            integer :: rows
            integer :: cols
        end subroutine callable
    end interface genfromtxt_v2_user_interface
end python module genfromtxt_v2__user__routines

(i.e. you specify the dimensions). The rest of the file is left unchanged. Compile with f2py -c genfromtxt.f90 _genfromtxt.pyf

The small Python wrapper is

import _genfromtxt

def genfromtxt_v2(filename):
    class SaveArrayCallable(object):
        def __call__(self,array):
            self.array = array.copy() # needed to avoid data corruption

    f = SaveArrayCallable()
    _genfromtxt.genfromtxt_v2(filename,f)
    return f.array

I think this should be thread safe: although I think Python callback functions are implemented as global variables, but default f2py doesn't release the GIL so no Python code can be run between the global being set and the callback being run. I doubt you care about thread safety for this application though...