cloudtone 发表于 2017-4-13 00:01:31

R data structures

Vectors(向量)
The fundamental R data structure is the vector, which stores an ordered set of values called elements.A Vector can contain any number of elements, but all of the elements must be the same type of values.
Serveral vectortypes are commonly used in machine learning: integer(numbers without decimals), double(numbers with decimals). character(text data),and logical(TRUE or FALSE values)。 There are also two special values: NULL,which is used to indicate the absence of any value, and NA,which indicated a missing value.
                                                                                                                                                It is tedious to enter large amounts of data manually, but small vectors can be createdby using the c() combine function. The vector can also be given a name using the<- arrow operator, which is R's way of assigning values, much like the = assignmentoperator is used in many other programming languages
> subject_name<-c("John Doe", "Jane Doe", "Steve Graves")
> temperature <- c(98.1, 98.6, 101.4)
> flu_status <- c(FALSE, FALSE, TRUE)

                                                                                                Because R vectors are inherently ordered, the records can be accessed by countingthe item's number in the set, beginning at one, and surrounding this number withsquare brackets (that is, [ and ]) after the name of the vector
> temperature
98.6

                                                                                                R offers a variety of convenient methods to extract data from vectors. A range ofvalues can be obtained using the (:) colon operator
> temperature
98.6 101.4
                                                Items can be excluded by specifying a negative item number
> temperature[-2]
98.1 101.4
                                                Finally, it is also sometimes useful to specify a logical vector indicating whether eachitem should be included
> temperature
98.1 98.6
Factors(因子)

                                                                                                A factor is a special case of vector that is solely used to representcategorical or ordinal variables. In the medical dataset we are building, we might usea factor to represent gender, because it uses two categories: MALE and FEMALE
                                                To create a factor from a character vector, simply apply the factor() function.
> gender <- factor(c("MALE","FEMALE","MALE"))
> gender
MALE   FEMALE MALE
Levels: FEMALE MALE

                                                                                                Notice that when the gender data for John Doe and Jane Doe were displayed,
R printed additional information about the gender factor. The levels variablecomprise the set of possible categories factor could take, in this case: MALE or FEMALE.
                                                When we create factors, we can add additional levels that may not appear inthe data
> blood <- factor(c("O", "AB", "A"), levels = c("O", "AB", "A", "B"))
> blood
OAB
Levels: O AB A B

                                                                                                Notice that when we de ned the blood factor for the three patients, we speci ed
an additional vector of four possible blood types using the levels parameter. As aresult, even though our data included only types O, AB, and A, all the four types arestored with the blood factor as indicated by the output
                                                The factor data structure also allows us to include information about the order of anominal variable's categories, which provides a convenient way to store ordinal data.
> symptoms <- factor(c("SEVERE", "MILD", "MODERATE"), levels = c("MILD", "MODERATE", "SEVERE"), ordered = TRUE)
> symptoms
SEVERE   MILD   MODERATE
Levels: MILD < MODERATE < SEVERE

                                                                                                The resulting symptoms factor now includes information about the order werequested. Unlike our prior factors, the levels value of this factor are separatedby < symbols, to indicate the presence of a sequential order from mild to severe

                                                                                                A helpful feature of the ordered factors is that logical tests work as you expect. Forinstance, we can test whether each patient's symptoms are greater than moderate
> symptoms > "MODERATE"
TRUE FALSE FALSE
List(列表)

                                                                                                A list is a data structure, much like a vector, in that it is used for storing an orderedset of elements. However, where a vector requires all its elements to be the sametype, a list allows different types of elements to be collected. Due to thisexibility,lists are often used to store various types of input and output data and sets ofcon guration parameters for machine learning models.
                                                                                                                                                Similar to creating a vector with c(), a list is created using the list() function,

                               
                       
               
as shown in the following example. One notable difference is that when a list isc**tructed, each component in the sequence is almost always given a name. Thenames are not technically required, but allow the list's values to be accessed later onby name rather than by numbered position.
> subject1 <- list(fullname = subject_name, temperature = temperature, flu_status = flu_status, gender = gender,blood = blood, symptoms = symptoms)
> subject1
$fullname
"John Doe"

$temperature
98.1

$flu_status
FALSE

$gender
MALE
Levels: FEMALE MALE

$blood
O
Levels: O AB A B

$symptoms
SEVERE

Levels: MILD < MODERATE < SEVERE

                                                                                                Note that the values are labeled with the names we speci ed in the precedingcommand. However, a list can still be accessed using methods similar to a vector.
> subject1
$temperature

98.1

                                                                                                The result of using vector-style operators on a list object is another list object, whichis a subset of the original list.
                                                To return a single list item in its native data type,use double brackets ([[ and ]]) when attempting to select the list component.
> subject1[]
98.1
                                                For clarity, it is often easier to access list components directly, by appending a $ andthe value's name to the name of the list component
> subject1$temperature
98.1

                                                                                                It is possible to obtain several items in a list by specifying a vector of names.
> subject1
$temperature
98.1

$flu_status
FALSE
待续。。。。。
引用:
Packt《 Machine Learning with R 2nd Edition》









                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               
                               
                       
               

computerniu 发表于 2017-4-13 00:10:26

你这是复制的什么网页?

牛蛙 发表于 2017-4-13 00:13:39

实在看不懂http://www.bbsls.net//mobcent//app/data/phiz/default/22.png

亦宁 发表于 2017-4-13 09:25:58

做笔记的,做笔记的,大家能看懂的看,看不懂的给大家点赞!

cloudtone 发表于 2017-4-13 09:51:52

computerniu 发表于 2017-4-13 00:10
你这是复制的什么网页?

手敲的 笔记 :cc17)显示的格式不太好 乱了

computerniu 发表于 2017-4-13 09:53:54

cloudtone 发表于 2017-4-13 09:51 static/image/common/back.gif
手敲的 笔记   显示的格式不太好 乱了


tyy 发表于 2017-4-13 10:50:35

说的啥

莞尔 发表于 2017-4-13 22:56:44

大赞
页: [1]
查看完整版本: R data structures