[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[dennou-ruby:003649] Re: JRuby NetCDF-3 support
- To: rodrigo@xxxxxxxxxxxxxxxxxxx
- Subject: [dennou-ruby:003649] Re: JRuby NetCDF-3 support
- From: Takeshi Horinouchi <horinout@xxxxxxxxxxxxxxxxx>
- Date: Fri, 09 Aug 2013 10:07:18 +0900
- Cc: horinout@xxxxxxxxxxxxxxxxx, dennou-ruby@xxxxxxxxxxx
Dear Mr Botafogo,
Thank you for sharing the info. It is nice to hear that
NetCDF can be accessed via JRuby. (so I forward your message
to the GFD Dennou Ruby mailing list; cc'ed)
By the way, is the user inteface of MDArray is basically the
same as NArray? Is Colt a good library?
regards,
Takeshi Horinouchi
> Dear Mr. Takeshi Horinouchi,
>
> I've used Dennou Club NetCDF-3 previously and I was inspired by your work.
> I've just implemented a first version of NetCDF-3 for JRuby, based on
> NetCDF-java libraries from UCAR. I thought you might be interested in
> knowing it, so I'm including the announcement about this software.
>
> Sincerely,
>
> Rodrigo Botafogo
>
>
>
>
> Announcement
> ============
>
> MDArray version 0.5.4 has Just been released. MDArray is a multi
> dimensional array implemented
> for JRuby inspired by NumPy (www.numpy.org) and Masahiro Tanaka's Narray (
> narray.rubyforge.org).
> MDArray stands on the shoulders of Java-NetCDF and Parallel Colt. At this
> point MDArray has
> libraries for mathematical, trigonometric and descriptive statistics
> methods.
>
> NetCDF-Java Library is a Java interface to NetCDF files, as well as to many
> other types of
> scientific data formats. It is developed and distributed by Unidata
> (http://www.unidata.ucar.edu).
>
> Parallel Colt (
> http://grepcode.com/snapshot/repo1.maven.org/maven2/net.sourceforge.parallelcolt/
> parallelcolt/0.10.0/) is a multithreaded version of Colt (
> http://acs.lbl.gov/software/colt/).
> Colt provides a set of Open Source Libraries for High Performance
> Scientific and Technical
> Computing in Java. Scientific and technical computing is characterized by
> demanding problem
> sizes and a need for high performance at reasonably small memory footprint.
>
> For more information and (some) documentation please go to:
> https://github.com/rbotafogo/mdarray/wiki
>
> What's new:
> ===========
>
> NetCDF-3 File Support
> ---------------------
>
> From Wikipedia, the free encyclopedia:
>
> "NetCDF (Network Common Data Form) is a set of software libraries and
> self-describing,
> machine-independent data formats that support the creation, access, and
> sharing of array-oriented
> scientific data. The project homepage is hosted by the Unidata program at
> the University
> Corporation for Atmospheric Research (UCAR). They are also the chief source
> of netCDF software,
> standards development, updates, etc. The format is an open standard. NetCDF
> Classic and 64-bit
> Offset Format are an international standard of the Open Geospatial
> Consortium.
>
> The project is actively supported by UCAR. Version 4.0 (released in 2008)
> allows the use of the
> HDF5 data file format. Version 4.1 (2010) adds support for C and Fortran
> client access to
> specified subsets of remote data via OPeNDAP.
>
> The format was originally based on the conceptual model of the Common Data
> Format developed by
> NASA, but has since diverged and is not compatible with it."
>
> This version of MDArray implements NetCDF-3 file support only. NetCDF-4 is
> not yet supported. At
> the end of this announcement we show the MDArray implementation of the
> NetCDF-3 file writing
> from the tutorial at:
> http://www.unidata.ucar.edu/software/netcdf-java/tutorial/NetcdfWriting.html
>
>
> MDArray and SciRuby:
> ====================
>
> MDArray subscribes fully to the SciRuby Manifesto (http://sciruby.com/).
>
> 迭uby has for some time had no equivalent to the beautifully constructed
> NumPy, SciPy, and
> matplotlib libraries for Python.
>
> We believe that the time for a Ruby science and visualization package has
> come. Sometimes
> when a solution of sugar and water becomes super-saturated, from it
> precipitates a pure,
> delicious, and diabetes-inducing crystal of sweetness, induced by no more
> than the tap of a
> finger. So is occurring now, we believe, with numeric and visualization
> libraries for Ruby.〓>
> MDArray main properties are:
> ============================
>
> + Homogeneous multidimensional array, a table of elements (usually
> numbers), all of the
> same type, indexed by a tuple of positive integers;
> + Easy calculation for large numerical multi dimensional arrays;
> + Basic types are: boolean, byte, short, int, long, float, double, string,
> structure;
> + Based on JRuby, which allows importing Java libraries;
> + Operator: +,-,*,/,%,**, >, >=, etc.;
> + Functions: abs, ceil, floor, truncate, is_zero, square, cube, fourth;
> + Binary Operators: &, |, ^, ~ (binary_ones_complement), <<, >>;
> + Ruby Math functions: acos, acosh, asin, asinh, atan, atan2, atanh, cbrt,
> cos, erf, exp,
> gamma, hypot, ldexp, log, log10, log2, sin, sinh, sqrt, tan, tanh, neg;
> + Boolean operations on boolean arrays: and, or, not;
> + Fast descriptive statistics from Parallel Colt (complete list found
> bellow);
> + Easy manipulation of arrays: reshape, reduce dimension, permute,
> section, slice, etc.;
> + Support for reading and writing NetCDF-3 files;
> + Reading of two dimensional arrays from CSV files (mainly for debugging
> and simple testing
> purposes);
> + StatList: a list that can grow/shrink and that can compute Parallel Colt
> descriptive
> statistics;
> + Experimental lazy evaluation (still slower than eager evaluation).
>
> Descriptive statistics methods imported from Parallel Colt:
> ===========================================================
>
> + auto_correlation, correlation, covariance, durbin_watson, frequencies,
> geometric_mean,
> + harmonic_mean, kurtosis, lag1, max, mean, mean_deviation, median, min,
> moment, moment3,
> + moment4, pooled_mean, pooled_variance, product, quantile,
> quantile_inverse,
> + rank_interpolated, rms, sample_covariance, sample_kurtosis,
> sample_kurtosis_standard_error,
> + sample_skew, sample_skew_standard_error, sample_standard_deviation,
> sample_variance,
> + sample_weighted_variance, skew, split, standard_deviation,
> standard_error, sum,
> + sum_of_inversions, sum_of_logarithms, sum_of_powers,
> sum_of_power_deviations,
> + sum_of_squares, sum_of_squared_deviations, trimmed_mean, variance,
> weighted_mean,
> + weighted_rms, weighted_sums, winsorized_mean.
>
> Double and Float methods from Parallel Colt:
> ============================================
>
> + acos, asin, atan, atan2, ceil, cos, exp, floor, greater, IEEEremainder,
> inv, less, lg,
> + log, log2, rint, sin, sqrt, tan.
>
> Double, Float, Long and Int methods from Parallel Colt:
> =======================================================
>
> + abs, compare, div, divNeg, equals, isEqual (is_equal), isGreater
> (is_greater),
> + isles (is_less), max, min, minus, mod, mult, multNeg (mult_neg),
> multSquare (mult_square),
> + neg, plus (add), plusAbs (plus_abs), pow (power), sign, square.
>
> Long and Int methods from Parallel Colt
> =======================================
>
> + and, dec, factorial, inc, not, or, shiftLeft (shift_left),
> shiftRightSigned
> (shift_right_signed), shiftRightUnsigned (shift_right_unsigned), xor.
>
> MDArray installation and download:
> ==================================
>
> + Install Jruby
> + jruby 亡 gem install mdarray
>
> MDArray Homepages:
> ==================
>
> + http://rubygems.org/gems/mdarray
> + https://github.com/rbotafogo/mdarray/wiki
>
> Contributors:
> =============
> Contributors are welcome.
>
> MDArray History:
> ================
>
> + 07/08/2013: Version 0.5.4 - Support for reading and writing NetCDF-3
> files
> + 24/06/2013: Version 0.5.3 Over 90% Performance improvements for
> methods imported
> from Parallel Colt and over 40% performance improvements for all
> other methods
> (implemented in Ruby);
> + 16/05/2013: Version 0.5.0 - All loops transferred to Java with over 50%
> performance
> improvements. Descriptive statistics from Parallel Colt;
> + 19/04/2013: Version 0.4.3 - Fixes a simple, but fatal bug in 0.4.2. No
> new features;
> + 17/04/2013: Version 0.4.2 - Adds simple statistics and boolean
> operators;
> + 05/04/2013: Version 0.4.0 Initial release.
>
> NetCDF-3 Writing with MDArray API
> =================================
>
> require 'mdarray'
>
> class NetCDF
>
> attr_reader :dir, :filename, :max_strlen
>
>
> #---------------------------------------------------------------------------------------
> #
>
> #---------------------------------------------------------------------------------------
>
> def initialize
> @dir = "~/tmp"
> @filename1 = "testWriter"
> @filename2 = "testWriteRecord2"
> @max_strlen = 80
> end
>
>
> #---------------------------------------------------------------------------------------
> # Define the NetCDF-3 file
>
> #---------------------------------------------------------------------------------------
>
> def define_file
>
> # We pass the directory, filename, filetype and optionaly the
> outside_scope.
> #
> # I'm implementing in cygwin, so the need for method cygpath that
> converts the
> # directory name to a Windows name. In another environment, just pass
> the directory
> # name.
> #
> # Inside a block we have another scope, so the block cannot access any
> variables, etc.
> # from the ouside scope. If we pass the outside scope, in this case we
> are passing self,
> # we can access variables in the outside scope by using
> @outside_scope.<variable>.
> NetCDF.define(cygpath(@dir), @filename1, "netcdf3", self) do
>
> # add dimensions
> dimension "lat", 64
> dimension "lon", 128
>
> # add variables and attributes
> # add Variable double temperature(lat, lon)
> variable "temperature", "double", [@dim_lat, @dim_lon]
> variable_att @var_temperature, "units", "K"
> variable_att @var_temperature, "scale", [1, 2, 3]
>
> # add a string-value variable: char svar(80)
> # note that this is created as a scalar variable although in NetCDF-3
> there is no
> # string type and the string has to be represented as a char type.
> variable "svar", "string", [], {:max_strlen =>
> @outside_scope.max_strlen}
>
> # add a 2D string-valued variable: char names(names, 80)
> dimension "names", 3
> variable "names", "string", [@dim_names], {:max_strlen =>
> @outside_scope.max_strlen}
>
> # add a scalar variable
> variable "scalar", "double", []
>
> # add global attributes
> global_att "yo", "face"
> global_att "versionD", 1.2, "double"
> global_att "versionF", 1.2, "float"
> global_att "versionI", 1, "int"
> global_att "versionS", 2, "short"
> global_att "versionB", 3, "byte"
>
> end
>
> end
>
>
> #---------------------------------------------------------------------------------------
> # write data on the above define file
>
> #---------------------------------------------------------------------------------------
>
> def write_file
>
> NetCDF.write(cygpath(@dir), @filename1, self) do
>
> temperature = find_variable("temperature")
> shape = temperature.shape
> data = MDArray.fromfunction("double", shape) do |i, j|
> i * 1_000_000 + j * 1_000
> end
> write(temperature, data)
>
> svar = find_variable("svar")
> write_string(svar, "Two pairs of ladies stockings!")
>
> names = find_variable("names")
> # careful here with the shape of a string variable. A string
> variable has one
> # more dimension than it should as there is no string type in
> NetCDF-3. As such,
> # if we look as names' shape it has 2 dimensions, be we need to
> create a one
> # dimension string array.
> data = MDArray.string([3], ["No pairs of ladies stockings!",
> "One pair of ladies stockings!",
> "Two pairs of ladies stockings!"])
> write_string(names, data)
>
> # write scalar data
> scalar = find_variable("scalar")
> write(scalar, 222.333 )
>
> end
>
> end
>
>
> #---------------------------------------------------------------------------------------
> # Define a file for writing one record at a time
>
> #---------------------------------------------------------------------------------------
>
> def define_one_at_time
>
> NetCDF.define(cygpath(@dir), @filename2, "netcdf3", self) do
>
> dimension "lat", 3
> dimension "lon", 4
> # zero sized dimension is an unlimited dimension
> dimension "time", 0
>
> variable "lat", "float", [@dim_lat]
> variable_att @var_lat, "units", "degree_north"
>
> variable "lon", "float", [@dim_lon]
> variable_att @var_lon, "units", "degree_east"
>
> variable "rh", "int", [@dim_time, @dim_lat, @dim_lon]
> variable_att @var_rh, "long_name", "relative humidity"
> variable_att @var_rh, "units", "percent"
>
> variable "T", "double", [@dim_time, @dim_lat, @dim_lon]
> variable_att @var_t, "long_name", "surface temperature"
> variable_att @var_t, "units", "degC"
>
> variable "time", "int", [@dim_time]
> variable_att @var_time, "units", "hours since 1990-01-01"
>
> end
>
> end
>
>
> #---------------------------------------------------------------------------------------
> # Define a file for writing one record at a time
>
> #---------------------------------------------------------------------------------------
>
> def write_one_at_time
>
> NetCDF.write(cygpath(@dir), @filename2, self) do
>
> lat = find_variable("lat")
> lon = find_variable("lon")
>
> # write non recored data to the variables
> write(lat, MDArray.float([3], [41, 40, 39]))
> write(lon, MDArray.float([4], [-109, -107, -105, -103]))
>
> # get record variables from file
> rh = find_variable("rh")
> time = find_variable("time")
> t = find_variable("T")
>
> # there is no method find_dimension for NetcdfFileWriter, so we need
> to get the
> # dimension from a variable.
> rh_shape = rh.shape
> dim_lat = rh_shape[1]
> dim_lon = rh_shape[2]
>
> (0...10).each do |time_idx|
>
> # fill rh_data array
> rh_data = MDArray.fromfunction("int", [dim_lat, dim_lon]) do |lat,
> lon|
> time_idx * lat * lon
> end
> # reshape rh_data so that it has the same shape as rh variable
> # Method reshape! reshapes the array in-place without data copying.
> rh_data.reshape!([1, dim_lat, dim_lon])
>
> # fill temp_data array
> temp_data = MDArray.fromfunction("double", [dim_lat, dim_lon]) do
> |lat, lon|
> time_idx * lat * lon / 3.14159
> end
> # reshape temp_data array so that it has the same shape as temp
> variable.
> temp_data.reshape!([1, dim_lat, dim_lon])
>
> # write the variables
> write(time, MDArray.int([1], [time_idx * 12]), [time_idx])
> write(rh, rh_data, [time_idx, 0, 0])
> write(t, temp_data, [time_idx, 0, 0])
>
> end # End time_idx loop
>
> end
>
> end
>
> end
>
> netcdf = NetCDF.new
> netcdf.define_file
> netcdf.write_file
> netcdf.define_one_at_time
> netcdf.write_one_at_time
>
>
>
>
> --
> Rodrigo Botafogo
> Integrando TI ao seu neg〓io
Takeshi Horinouchi
Faculty of Environmental Earth Science, Hokkaido University
N10W5 Sapporo, Hokkaido 060-0810, Japan