# Probability and Statistics Seminar

11/10/2006, 14:30 — Room P3.31, Mathematics Building
Ana M. Bianco, IC/FCEN - Universidade de Buenos Aires

### Tests Robustos en el Modelo de Regresión Logística

En este trabajo se propone un test robusto para el parámetro de regresión de un modelo logístico. El test propuesto es un test tipo Wald, basado en una versión pesada del estimador propuesto por Bianco y Yohai (1996) tal como fue implementado por Croux y Haesbroeck (2003). Se estudia la distribución asintótica del estadístico bajo la hipótesis nula y bajo alternativas contiguas. Se realiza un estudio Monte Carlo para investigar la estabilidad del nivel y la potencia del test bajo contaminación y para comparar el comportamiento del test propuesto en el caso de muestras finitas con el test clásico y con otras propuestas robustas. Finalmente, se ilustra la performance del test propuesto sobre un conjunto de datos reales.
21/06/2006, 11:30 — Room P3.31, Mathematics Building
Guy Latouche, Université Libre de Bruxelles

### Structured Markov Chains in Applied Probability and Numerical Analysis

About thirty years ago, Quasi-Birth-and-Death processes and Skip-Free Markov chains came to the attention of applied probabilists. One of their prominent features is that their analysis requires the resolution of nonlinear equations, involving matrix-polynomial or matrix power series. At first, these were tackled 'in-house' and very soon several algorithms appeared which had their justification grounded, to a large extent, in probabilistic thinking. Soon, these equations caught the attention of numerical analysts who brought to bear their own special way of thinking about such problems and, not surprisingly, obtained improved algorithms in terms of convergence speed or numerical accuracy. The interaction between the two lines of approach are very exciting and this is an attempt to illustrate how the one meshes into the other.
21/06/2006, 10:00 — Room P3.31, Mathematics Building
Wolfgang Schmid, Europe University, Frankfurt (Oder)

### Comparison of Different Estimation Techniques for Portfolio Selection

The main obstacle in the application of the mean-variance portfolio selection is the fact that the moments of the asset returns are unknown. In practice the optimal portfolio weights are estimated by replacing these moments with the classical unbiased sample estimators. We provide a comparison of the exact and the asymptotic distributions of these estimated portfolio weights as well as a sensitivity analysis to shifts in the moments of the asset returns. Furthermore the paper compares the classical estimators of the moments of the asset returns with the recently proposed shrinkage estimators within the framework of portfolio selection. It is shown how the uncertainty about the portfolio weights can be introduced into the performance measurement of trading strategies. The methodology explains the bad out-of-sample performance of the classical Markowitz procedures.
08/06/2006, 15:00 — Room P3.31, Mathematics Building

### Bootstrap Robusto com base na Função de Influência

01/06/2006, 15:00 — Room P3.31, Mathematics Building
Isabel Rodrigues, DMIST

### Testes Robustos para o Modelo das Componentes Principais Comuns

Em alguns métodos estatísticos, como é por exemplo o caso da análise discriminante, poderá ser importante a comparação da estrutura de covariâncias de duas ou mais populações. Muitas vezes a suposição de igualdade das matrizes de covariâncias é claramente inadequada e a estimação das matrizes em separado não respeita o princípio da parcimónia. Com alternativa, alguns autores, com foi o caso de Flury (1988), estudaram modelos com estruturas de covariâncias comuns. Um deste modelos é conhecido como Componentes Principais Comuns (Flury, 1984), por ser uma generalização das Componentes Principais para k grupos e assume que as k matrizes de covariâncias têm valores próprios diferentes mas vectores próprios idênticos. Mais restrito é o modelo proporcional onde se admite que as matrizes de covariâncias diferem apenas de uma constante. Em Flury (1988) foram deduzidos e estudados os estimadores de máxima verosimilhança dos parâmetros destes modelos e construídos testes de razão de verosimilhanças para validar relações entre a estrutura de covariâncias das populações. Contudo, tanto a estimação clássica dos parâmetros com os testes de razão de verosimilhança são em muitas situações sensíveis a observações discordantes. Alternativas robustas de estimação via “plug-in” (PI) e “projection-pursuit” (PP) foram estudadas por Boente, Pires e Rodrigues (2002, 2005a, 2005b). Neste trabalho são propostos alguns procedimentos robustos para testar relações entre as estruturas das covariâncias de k populações.
15/02/2006, 14:30 — Room P3.31, Mathematics Building
Graciela Boente, Universidad de Buenos Aires

### Regiões de Tolerância Multivariadas Robustas

24/01/2006, 14:30 — Room P3.31, Mathematics Building
Jorge Alberto Achcar, Universidade Federal de São Carlos

### Estimators of Sensitivity and Specificity in the Presence of Verification Bias: A Bayesian Approach

Verification bias can occur if some of the patients with test results are not selected to receive the gold standard procedure. Unverified cases frequently are not suggestive to be positives. Consequently, the set of verified cases overestimates the number of true positives and underestimates the number of true negatives. The sensitivity and specificity estimates based only on the patients with verified disease are often biased. In this work, we derive unbiased estimators for sensitivity and specificity using a Bayesian approach. Marginal posterior densities of all parameters are estimated using the Gibbs sampler algorithm. An application to the study of accuracy of Hybrid Capture II in the diagnosis of cervical intraepithelial neoplasia grade 2 and 3 illustrates the proposed methodology.
11/10/2005, 14:30 — Room P4.35, Mathematics Building
Rudolf Dutter, Vienna University of Technology

### Development of a Data Analysis System in R, with Graphical Interface

The computer program system R already offers extremely many powerful data analysis tools. The development of a general graphical user interface is still at the beginning (Fox, 2004). We discuss the historical entrance of an older data analysis system (DAS) with many "new" and powerful features in an R-package. This is designed for graphically oriented analysis with special emphasis on geochemical data. Practical examples from the Kola project are illustrated.
04/10/2005, 11:30 — Room P3.10, Mathematics Building
Antonis Economou, University of Athens

### Exact computations and approximations for the stationary distributions of Markov chains in random environments and applications in queueing and population growth models

We consider a general model for a continuous time Markov chain in random environment. We study certain form of interaction between the process of interest and the environmental process, under which the stationary joint distribution is tractable. More specifically we obtain necessary and sufficient conditions for a generalized product-form stationary distribution. When these conditions fail we propose an alternative technique that transform the original system of the balance equations to an equivalent system. Applications in queueing and population growth models illustrate the scope and the efficiency of the methods.
13/09/2005, 16:00 — Room P4.35, Mathematics Building
Yarema Okhrin, Department of Statistics, University of Frankfurt (Oder), Germany

### Distributional properties and estimation of optimal portfolio

The Markowitz theory of portfolio selection is a classical part of asset allocation. Under the assumption of Gaussian asset returns and investor’s preferences given by the quadratic utility function, we can present the optimal portfolio weights as a function of the first two moments of asset returns. The true moments are unknown to the investor and should be estimated from a sample. Because of this practical applications often suffer from very large or negative portfolio weights. The aim of this project is to assess the distributional properties of estimated portfolio weights and to develop improved estimation procedures. Okhrin and Schmid (2005a) consider the maximum-likelihood estimation of the moments of asset returns. They provide expression for the mean and variance of the estimated portfolio weights of four different types. It appears that the estimated weights are heavily biased in small samples and have very large variance. This explains the empirical evidence from practical applications. It is also shown that the estimated global minimum variance portfolio weights follow multivariate t-distribution, what is of special interest in testing problems. For the portfolio weights that maximize the Sharpe ratio it appears that the moments of order equal or greater than one do not exist. This questions the usefulness of such estimator and makes the results untractable. A classical approach to decrease the volatility of an estimator is shrinkage technique. Using the result of Stein, Jorion (1986) first applied the shrinkage estimation of the expected asset returns to portfolio selection. Recently Ledoit and Wolf (2003, 2004) constructed a shrinkage estimator of the covariance matrix, which is robust against the singularity of sample covariance matrix. Okhrin and Schmid (2005b) applied the shrinkage methodology directly to the optimal portfolio weights by shrinking the classical portfolio weights to the weights obtained from a linear factor model. The optimal shrinkage intensity is derived to minimize the mean-square error. It appears, that the shrinkage estimator is also very successful in the reduction of the variance of portfolio return. Additionally, a new estimator is constructed by using predictive moments from a Bayesian framework with zero-mean prior distribution for the slopes of the factor model.
30/05/2005, 11:30 — Room P3.10, Mathematics Building

### Traffic Modeling of Voice Over IP

Voice over IP (VoIP) uses TCP/IP as the transport network for voice conversations, taking advantage of its statistical multiplexion and allowing the usage of 'free' transport networks as the Internet. The main shortcoming of VoIP are: possibility of data losses and increase of transport delay. These impairments can severely affect the perceived conversation's quality and must be carefully avoided by the appropriate use of queueing analysis, connection admission control algorithms and dimensioning methods. In all these cases, an accurate VoIP traffic modeling is needed. This seminar is aimed to introduce the modeling of VoIP traffic, taking into account the effect of the modern codec used in VoIP: the Voice Activity Detection and the Confort Noise Generation. The one voice source case is first introduced. After that, the multiplexion of several sources is addressed, and the main well known models (Fluid Model and Markov modulated Poisson process) are adapted to the VoIP scenario. Finally a comparative study of the adequacy of the existing models concludes the seminar.
04/02/2005, 14:30 — Room P3.31, Mathematics Building
Frank Critchley, The Open University, UK

### Skewness a la mode?

A new approach to measuring skewness of univariate distributions is developed. A corresponding notion of kurtosis follows naturally. Further developments are briefly indicated.
02/02/2005, 14:00 — Room P3.10, Mathematics Building
Isabel Pereira, Departamento de Matemática, Universidade de Aveiro / U&D Matemática e Aplicações

### Propriedades, Estimação e Predição em Modelos Bilineares com Erros Exponenciais

Em muitas situações reais, além de se estar perante fenómenos com saltos em instantes aleatórios, as observações que constituem a série poderão apresentar um grande enviesamento, serem estritamente positivas com valores muito pequenos, próximos de zero. Um processo que poderá modelar este tipo de situações, e que se irá considerar neste trabalho, é o modelo bilinear $\mathrm{BL}\left(1,0,1,1\right)$ com erros exponenciais. Em particular, obtêm-se as condições sob as quais o modelo é estritamente estacionário e apresentam-se algumas propriedades da distribuição estacionária, em termos dos seus momentos. Sugerem-se duas metodologias para a estimação de parâmetros, no domínio temporal e no domínio da frequência, respectivamente a abordagem bayesiana e o critério de Whittle. Os procedimentos propostos são ilustrados e comparados através de um estudo de simulação. Finalmente, faz-se ainda uma breve análise de predição, usando a metodologia Bayesiana para fazer a previsão da observação futura.
14/01/2005, 14:00 — Room P3.31, Mathematics Building

### Caracterização Estatística de Tráfego Internet

12/01/2005, 14:00 — Room P3.31, Mathematics Building
Graciela Boente, Departamento de Matemática, Universidade de Buenos Aires

### Robust bandwidth selectors in semiparametric partly linear regression models

Consider a semiparametric partly linear model, with response variable $y$ and covariates ${x}_{1},\dots ,{x}_{p}$ and $t$. This model can be a suitable choice when one suspects that the response depends linearly on $x$, but that it is nonlinearly related to $t$. Least square estimators have been studied by several authors. All these estimators, as nonparametric estimators, depend on a smoothing parameter that should be chosen by the practitioner. As it is well known, large bandwidths produce estimators with small variance but high bias, while small values produce more wiggly curves. This trade-off between bias and variance lead to several proposals to select the smoothing parameter, such as cross-validation procedures and plug-in methods. It is well known that, both in linear regression and in nonparametric regression, least squares estimators can be seriously affected by anomalous data. The same statement holds for partly linear models. To avoid that problem, Bianco and Boente (2003) considered a three-step robust estimate for the regression parameter and the regression function. In this talk, we will introduce a robust plug-in selector for the bandwidth, under a partly linear model with fixed design which converges to the optimal one and leads to robust data-driven estimates of the regression function and the regression parameter. Our plug-in proposal is based on nonparametric robust estimates of the $j$-th derivatives, which extends the proposals given when $j=2$. We define an empirical influence measure for data-driven bandwidth selectors and, through it, we study the sensitivity of the plug-in selector. We use a Monte Carlo study to compare the performance of the classical approach and of the resistant selectors under normality and contamination. It appears that the robust selector compares favourably to its competitor, despite the need to select a pilot bandwidth. When combined with the three-step procedure proposed by Bianco and Boente (2003), it leads to robust data-driven estimates both of the regression function and the regression parameter.
22/11/2004, 17:15 — Room P3.10, Mathematics Building
Carlos Alberto B. Pereira, Instituto de Matemática e Estatística, Universidade de São Paulo, Brasil

### Informação Estatística e Análise Pré-posteriori

Esta palestra tem um carácter académico mais do que científico. O nosso objectivo é discutir o conceito de informação estatística como nos foi apresentado pelo Professor Dev Basu. Apresentaremos um exemplo simples de bolas em urnas. Com esse exemplo discutiremos o conceito de informação e mostraremos que podemos perder informação quando realizamos novos experimentos. Usaremos o conceito de Informação de DeGroot e suficiência de Blackwell para escolher experimentos. Por último mostraremos como podemos estabelecer o tamanho de amostra mínimo para atingir objectivos bem definidos.
03/11/2004, 15:00 — Room P3.31, Mathematics Building
Vilius Benetis, Technical University of Denmark

### Capacity Management in Cellular Hierarchical Networks

An efficient utilisation of the radio resources in mobile communications is of a great importance. In general a high degree of sharing is efficient, but requires service protection mechanisms to guarantee the Quality of Service for all customers. We study the effect of cell breathing and overlapping along with hierarchical cell structures. We show that by call packing we obtain a high utilisation. The transformation from cell-based network to direct routing network model is used to carry out calculations. The models in discussion are a generalisation of the Erlang-B formula, including general arrival processes and multi-rate (multi-media) traffic for second and third generation systems.
03/06/2004, 11:00 — Room P10, Mathematics Building
Margarida Rocheta, Liliana Marum e Susana Tereso, IBET - Grupo Pinus

### Um olhar sobre o reino vegetal: a biotecnologia aplicada ao pinheiro bravo

Neste seminário focam-se aspectos gerais sobre o desenvolvimento de uma planta. Em particular é discutida a questão de clonagem de pinheiros feita a partir de árvores melhoradas, de valor florestal comprovado. No seguimento desta questão discute-se como é possível criopreservar embriões de pinheiro por tempo indefinido. Uma vez que o termo chave é transformação genética, neste seminário será discutida o que é, como é efectuada e quais os resultados.
27/04/2004, 14:30 — Room P4.35, Mathematics Building
Marcel F. Neuts, Department of Systems and Industrial Engineering, The University of Arizona

### Exploring Finite Markov Chains by the Systematic Computation of Descriptors

We try to gain insight into the deeper physical behavior of a finite Markov chain by systematically computing quantities related to the visits to a string of nested sets of states. The choice of the successive states added to the nested sets is called an exploratory strategy. The strategy is constructed by focusing of the physical property to be explored. Quantities that serve as criteria in one strategy are reported as descriptors for the other strategies. This is a promising tool for the exploration of finite discrete-time Markov chains. Similar methods can be developed for continuous-time chains and Markov renewal processes, but the required computational methods are substantially different. We believe that this methodology may find applications, among other areas, in genetics and linguistics. The existing Markov chain analysis should be complemented by data analytic procedures applied to real or simulated data bases. The exploration in parallel of the Markov chains and suitable data sets can serve to develop the skills needed to gain reliable insights from the models and from the data sets.

Este Seminário é uma organização conjunta do CEMAT- Grupo 3 e da Sociedade Portuguesa de Estatística

20/04/2004, 11:00 — Room P3.10, Mathematics Building
Nuno Borralho, Director Florestal, Instituto de Investigação da Floresta e Papel, RAIZ

### Determinar o mérito genético de uma árvore

Um dos elementos centrais do sucesso do melhoramento genético é a capacidade de poder prever, com base num modelo genético simples e em métodos estatísticos apropriados, qual o valor genético de um indivíduo. Esta ciência designa-se por Genética Quantitativa. O seu objecto de estudo é poder dizer qual o mérito dos genes que contém (em relação à média da população a que pertence) e que levam a que tenha uma performance melhor. Irei apresentar resumidamente o modelo genético subjacente, os métodos estatísticos mais comuns e os desafios de análise estatística que encontramos na análise de dados reais.

