Why to learn the R programming language

R could be an artificial language and free computer code surroundings for applied math computing and graphics supported by the R Foundation for applied math Computing. The R language is widely used among statisticians and information miners for developing applied math computer code and information analysis. Polls, data processing surveys and studies of studious literature databases, show substantial will increase in quality in recent years. As of August 2018, R ranks 18th in the TIOBE index, a measure of the popularity of programming languages.
A GNU package, the source code for the R software environment is written primarily in C, Fortran and R itself and is freely available under the GNU General Public License. Pre-compiled binary versions are provided for varying in operating systems. Although R features a command interface, there are many graphical user interfaces, such as RStudio, an Integrated development environment.

History

R is Associate in a Nursing implementation of the S artificial language combined with lexical scoping linguistics, galvanized by theme. S was created by John Chambers in 1976, whereas at Bell Labs. There are some necessary variations, however abundant of the code written for S runs unreduced.

R was created by Ross Ihaka and parliamentarian Gentleman at the University of a metropolis, New Seeland, and currently developed by the R Development Core Team (of which Chambers is a member). R is called partially once the primary names of the primary 2 R authors and partially as a play on the name of S. The project was formed in 1992, with Associate in Nursing initial version free in 1995 and a stable beta version in 2000.

Statistical features

R and its libraries implement a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. R is definitely protrusile through functions and extensions, and also the R community is noted for its active contributions in terms of packages.

Many of R’s customary functions are written in R itself, that makes it simple for users to follow the algorithmic decisions created. For computationally intensive tasks, C, C++, and Fortran code can be linked and called at run time. Advanced users can write C, C++, Java, .NET or Python code to manipulate R objects directly.[26] R is highly extensible through the use of user-submitted packages for specific functions or specific areas of study.

Due to its S heritage, R has stronger object-oriented programming facilities than most applied math computing languages. Extending R is also eased by its lexical scoping rules. Another strength of R is static graphics, which can produce publication-quality graphs, including mathematical symbols. Dynamic and interactive graphics are obtainable through extra packages.

R has Rd, its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both online in a number of formats and in hard copy.

Programming features

R is Associate in Nursing has taken language; users generally access it through a command-line interpreter.

If a user sorts 2+2 at the R electronic communication and presses enter, the computer replies with 4, as shown below:

> 2 + 2
[1] 4

This calculation is taken because of the addition of 2 single-element vectors, resulting in a single-element vector.

The prefix [1] indicates that the list of components following it on constant line starts with the primary part of the vector (a feature that’s helpful once the output extends over multiple lines). Like alternative similar languages like APL and MATLAB, R supports matrix arithmetic. R’s information structures embody vectors, matrices, arrays, information frames (similar to tables during a relative database) and lists.

R’s protrusile object system includes objects for (among others): regression models, time-series and geospatial coordinates. The scalar information kind was ne’er a knowledge structure of R. Instead, a scalar is delineated as a vector with length one.

R supports procedural programming with functions and, for a few functions, object-oriented programming with generic functions. A generic operate acts otherwise betting on the categories of arguments passed to that. In alternative words, the generic operate dispatches the operate (method) specific thereto category of object. For example, R has a generic print function that can print almost every class of object in R with a simple print(object name) syntax.

Although used in the main by statisticians Associate in Nursing alternative practitioners requiring a surrounding for applied math computation and computer code development, R can also operate as a general matrix calculation toolbox – with performance benchmarks comparable to GNU Octave or MATLAB. Arrays are stored in column-major order.

Packages

The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices, import/export capabilities, reporting tools (knitr, Sweave), etc. These packages are developed primarily in R, and typically in Java, C, C++, and FORTRAN.

The R packaging system is additionally utilized by analyzers to form compendia to organize research information, code and report files in a systematic way of sharing and public archiving. A core set of packages is enclosed with the installation of R, with more than 15,000 additional packages (as of September 2018) available at the Comprehensive R Archive Network (CRAN), Bioconductor, Omegahat, GitHub, and other repositories.

The “Task Views” page (subject list) on the cubature unit web site lists a good variety of tasks (in fields like Finance, Genetics, High-Performance Computing, Machine Learning, Medical Imaging, Social Sciences, and Spatial Statistics) to that R has been applied and that packages are obtainable.

R has conjointly been known by the government agency as appropriate for deciphering information from the clinical analysis. Other R package resources include Crantastic, a community site for rating and reviewing all CRAN packages, and R-Forge, a central platform for the collaborative development of R packages, R-related software, and projects.

R-Forge conjointly hosts several unpublished beta packages and development versions of CRAN packages. The Bioconductor project provides R packages for the analysis of genomic information, like Affymetrix and complementary DNA microarray object-oriented data-handling and analysis tools, and has begun to give tools for analysis of data from next-generation high-throughput sequencing methods.

Milestones

A list of changes in R releases is maintained in numerous “news” files at cubature unit.
Some highlights area unit listed below for many major releases.
Release Date Description
0.16 This is the last alpha version developed primarily by Ihaka and Gentleman. Much of the basic functionality from the “White Book” (see S history) was implemented. The mailing lists commenced on April 1, 1997.
0.49 1997-04-23 This is the oldest source release which is currently available on CRAN. CRAN is started on this date, with 3 mirrors that initially hosted 12 packages. Alpha versions of R for Microsoft Windows and the classic Mac OS are made available shortly after this version.
0.60 1997-12-05 R becomes an official part of the GNU Project. The code is hosted and maintained on CVS.
0.65.1 1999-10-07 First versions of update.packages and install.packages functions for downloading and installing packages from CRAN.
1.0 2000-02-29 Considered by its developers stable enough for production use.
1.4 2001-12-19 S4 methods are introduced and the first version for Mac OS X is made available soon after.
2.0 2004-10-04 Introduced lazy loading, which enables fast loading of data with minimal expense of system memory.
2.1 2005-04-18 Support for UTF-8 encoding, and the beginnings of internationalization and localization for different languages.
2.11 2010-04-22 Support for Windows 64 bit systems.
2.13 2011-04-14 Adding a new compiler function that allows speeding up functions by converting them to byte-code.
2.14 2011-10-31 Added mandatory namespaces for packages. Added a new parallel package.
2.15 2012-03-30 New load balancing functions. Improved serialization speed for long vectors.
3.0 2013-04-03 Support for numeric index values 231 and larger on 64 bit systems.
3.4 2017-04-21 Just-in-time compilation (JIT) of functions and loops to byte-code enabled by default.
3.5 2018-04-23 Packages byte-compiled on installation by default. A compact internal representation of integer sequences. Added a new serialization format to support compact internal representations.

Added a replacement publishing format to support compact internal representations.

Interfaces

The most normally used graphically integrated development surroundings for R is RStudio. A similar development interface is R Tools for Visual Studio. Interfaces with additional of a point-and-click approach embody Rattle interface, R Commander, and RKWard. Some of the additional common editors with varied levels of support for R embody Eclipse, Emacs (Emacs Speaks Statistics), Kate, LyX, Notepad++, Visual Studio Code, WinEdt, and Tinn-R. R practicality is accessible from many scripting languages like Python, Perl, Ruby, F#, and Julia. Interfaces to different, high-level programming languages, like Java and .NET C# are available as well.

Implementations

The main R implementation is written in R, C, and Fortran, and there are several other implementations aimed at improving speed or increasing extensibility. A closely connected implementation is pqR (pretty fast R) by Radford M. Neal with improved memory management and support for automatic multithreading. Renjin and FastR area unit Java implementations of R to be used in an exceedingly Java Virtual Machine. CXXR, rho, and Riposte area unit implementations of R in C++. Renjin, Riposte, and pqR conceive to improve performance by exploitation multiple processor cores and a few styles of delayed analysis. Most of those different implementations area unit experimental and incomplete, with relatively few users, compared to the main implementation maintained by the R Development Core Team.

TIBCO designed a runtime engine referred to as TERR, which is part of Spotfire. Microsoft R Open could be a totally compatible R distribution with modifications for multi-threaded computations.

R communities

R has spirited and active native communities worldwide for users to network, share concepts and learn.

There are regular R-user meetups and more focused R-Ladies groups which promote gender diversity.

Examples

Basic syntax

The following examples illustrate the basic syntax of the language and use of the command-line interface.

In R, the generally preferred assignment operator is an arrow made from two characters,<- although can= usually be used instead.

> x <- 1:6  # Create vector.
> y <- x^2  # Create vector by formula.
> print(y)  # Print the vector’s contents.
[1]  1  4  9 16 25 36

> mean(y)  # Arithmetic mean of vector.
[1] 15.16667

> var(y)  # Sample variance of vector.
[1] 178.9667

> model <- lm(y ~ x)  # Linear regression model y = A + B * x.
> print(model)  # Print the model’s results.

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
     -9.333        7.000  

> summary(model)  # Display an in-depth summary of the model.

Call:
lm(formula = y ~ x)

Residuals:
      1       2       3       4       5       6 
 3.3333 -0.6667 -2.6667 -2.6667 -0.6667  3.3333 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -9.3333     2.8441  -3.282 0.030453 *  
x             7.0000     0.7303   9.585 0.000662 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 4 degrees of freedom
Multiple R-squared:  0.9583,	Adjusted R-squared:  0.9478 
F-statistic: 91.88 on 1 and 4 DF,  p-value: 0.000662

> par(mfrow = c(2, 2))  # Create a 2 by 2 layout for figures.
> plot(model)  # Output diagnostic plots of the model.

Structure of a function

One of R’s strengths is the ease of creating new functions. Objects in the function body remain local to the function, and any data type may be returned. Here is an example user-created function:

# Declare function “f” with parameters “x”, “y“
# that returns a linear combination of x and y.
f <- function(x, y) {
  z <- 3 * x + 4 * y
  return(z)
}

> f(1, 2) [1] 11 > f(c(1,2,3), c(5,3,4)) [1] 23 18 25 > f(1:3, 4) [1] 19 22 25

Posted in: R

1 thought on “Why to learn the R programming language”

Leave a Reply

Your email address will not be published. Required fields are marked *