We present an algorithm to find imprecise fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. An extension of the search algorithm allows finding fragments that also incorporate chemical expert knowledge about structural similarities that can be tolerated. We demonstrate the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the National Cancer Institute's HIV and Cancer-screening datasets.
The theory of Fuzzy Sets has been recognized as a suitable tool to model
several kinds of patterns that can hold in data. Particularly Fuzzy
Sets Theory has been shown to be a very useful tool in Data Mining to
representing a natural and human way the so called Association Rules The objective of this paper is to
present a revision of the most relevant results about the use of Fuzzy
Sets in Data Mining, specially in relation with the discovery of Association
Rules. First of all we will introduce the basic concepts of Data
Mining to justify the need of Fuzzy Sets Theory. A historical revision
on developments in this field is made too Next we will present
our researches about Fuzzy Association Rules, starting with the formulation
of a model to discover association rules among items in a (crisp) set
of fuzzy transactions. This general model can be particularized in several ways;
each particular instance corresponds to a different kind of pattern
and/or repository of data. We describe some applications of this scheme,
paying special attention to the discovery of fuzzy association rules
in relational databases Paper finishes with some suggestions
about future researches and problems to be solved.
Cluster analysis partitions data points into disjoint groups to quickly
gain first order knowledge from data. Recently we developed several principal
component based clustering algorithms with well defined clustering objective
functions. For the widely used K-means clustering we proved that the continuous
solution of the cluster membership indicator vector is the principal component[1],
leading to effective implementations of K-means clustering. For spectral
graph clustering, we proved that scaled principal components form a dynamic
process of self-aggregation in which data objects move towards each other
to form clusters, revealing the inherent pattern of similarity[2]. The MinMaxCut
spectral method follows a min-max clustering principle that the between-cluster
associations are minimized, while the within-cluster associations are maximized.
DNA microarrays can simultaneously monitor expression levels of thousands
of genes. Expression profiles of patients tissue samples provide molecular
rather than morphological signature of cancer. We apply data clustering to
expression profiles of human Lymphoma and demonstrate that data clustering
can effective discover different pathological stages of B-cell lymphoma thus
providing an effective diagnostics methodology[3].
Proteins carry out many cellular functions such as metabolism, communication,
growth. Systematic identification of these protein complexes provide essential
knowledge linking proteome dynamics to cellular function and phenotype. We
describe a unified representation of protein complex dataset based on bipartite
graphs, from which protein-protein interactions and protein-complex - protein-complex
assocations are deducted naturally. Apply data clustering to the protein-protein
network we obtain statistically significant protein clusters. Apply data
clustering to the protein-complex - protein-complex association network,
we obtain protein supercomplexes that provide systems-level understanding
of cellular processes through the concepts from Gene Ontology[4].
This paper presents the Soft Computing technique of Fuzzy Cognitive Maps (FCMs), which is a knowledge-based
methodology suitable to describe and model complex systems and handle information
from an abstract point of view [1]. Soft computing techniques such as FCMs
have been successfully used to model complex systems that involve discipline
factors, states, variables, input, output, events and trends. These modeling
techniques can integrate and include in the decision-making process the partial
influence of controversial factors, can take under consideration causal effect
among factors and evaluate the influence from different sources, factors and
other characteristics using fuzzy logic reasoning. Thus,
such methods are ideal for the development of
Decision Support systems in Medical Informatics, because in this application
area humans use decision approaches, such as
in differential diagnosis, for example, mainly based on fuzzy factors.Some of these factors are complementary, others similar
and others conflicting, which are all taken into consideration when a decision
is reached [2, 3, 4]. The involved factors have different
degrees of importance in determining (or influencing) the decision, and moreover
the fact that they can be additive and/or conflicting increases the complexity
of the problem and the vagueness of the decision.
Fuzzy Cognitive Maps develop a behavioral model of the system by exploiting the experience
and knowledge of experts due to the way that they are developed. Fuzzy Cognitive
Maps applicability in modeling complex systems has been successfully introduced
[5]. FCMs originated from the synergism of Fuzzy Logic and Neural Networks,
taking advantage of both theories. An FCM is a signed fuzzy graph with feedback,
consisting of concepts-nodes and weighted interconnections. Nodes of the graph
stand for concepts that are used to describe main behavioral characteristics
of the modeled system. Nodes are connected by signed and fuzzy weighted arcs
representing the cause and effect relationship existing among concepts. Thus,
an FCM is a fuzzy-graph structure, which allows systematic causal propagation,
in particular forward and backward chaining [6]. Fuzzy Cognitive Maps have
been successfully used to develop a Decision Support System for differential
diagnosis [7], to determine the success of radiation therapy process estimating
the final dose delivered to the target volume [8] and many other application
areas.
When a FCM is used as part of a decision-making system, some of the concepts are considered output
nodes, which correspond to each of the possible decisions. In general, the FCM could converge to a fixed point, limit cycle,
or chaotic attractor, but when FCMs are used for the decision making process,
it is desirable to converge to a region corresponding to the selection of
one decision. Therefore, the output nodes must
"compete" against each other in order for only one of them to dominate and
be considered the decision. Here a new idea is
proposed for achieving this "competition" between concepts. The interaction of each of these output nodes with
the other output nodes should be strongly inhibitory. This implies that the higher the value of a given node, this should
lead to a lowering of the value of competing nodes. In such a case the FCM,
referred to as a Competitive Fuzzy Cognitive Map (CFCM) will always converge
to a fixed appropriate region.
In this research work we further develop
CFCMs by combining CFCMs with methods and approaches that have been used
for Case-Based Reasoning (CBR). This is a successful methodology for managing
implicit knowledge, which has also been used in
medical informatics. CBRs embed a considerable amount of previous solved
instances of problems (called cases). The problem solving experience is explicitly
taken into account by storing past cases in database (case base), and by suitably
retrieving them when a new problem has to be tackled [9]. It simply solves
new cases by similarity of the solution of the old cases stored in its case-base
rather than using some derivative representation, as is done for example
in adaptive-type methodologies. But, if the new case has no match with the
stored cases, CBR has no solution. Similarly,
to FCMs and CFCMs, CBRs have been applied in medical diagnosis and patient
treatment outcomes. Despite the limitations
of CBRs they are usually assumed to have a certain degree of richness of
stored knowledge, and a certain degree of complexity due to the way they
are organized.
Even though successful
medical Decision Support CFCMs have been developed [7], [10] there are situations
where the patient data to be input into the system presents a very rare configuration
of symptoms where most of the nodes of the CFCM would not be active. In other words, for example, although the CFCM-Model
of a Medical Decision Support System has been designed to include all possible
symptoms and causative factors (nodes-concepts) and the relationship between
them (weights) for some medical condition, in a particular situation very
few symptoms are available and are taken into consideration. Thus, in such
a diagnosis or prognosis model Decision Support FCM, the decision would be
made only using a very small subset of the concepts of the entire system.
Such a system could lead to either an erroneous decision or difficulty in
reaching stability since the weighting of the active nodes reflects only a
small amount of the experts' stored knowledge. Using a CBR-Augmented CFCM
Decision support system, in such situations, the decisions support system
would draw upon cases that are maximally similar according to distance measures
and would use the CBR subsystem to generate a sub-CFCM emphasizing the nodes
activated by the patient data and thus redistributing the causal weightings
between the concept-nodes.
The advantage of
CBR-augmented CFCMs lies in the ability to represent rare occurrences of medical
conditions/symptoms, which may not be adequately represented in an FCM due
to its design methodology, which is dependent on human experts and learning
algorithms.
In this paper, we describe the algorithm that generates the CBR-augmented CFCMs and present their use and applicability for a Medical Decision System for the Speech and Language Impaired. Specifically the CBR-augmented CFCMs is proposed for Differential Diagnosis of Specific Language Impairment from Dyslexia and Autism, as an extension of an CFCM that has been developed [7], [10].
Drug delivery in chronic therapy is widely recognized as a complex control problem. The non-stationary character of the human body, as a result of tolerance to the drug and other pharmacokinetic parameters changing over time, requires intermittent therapy adjustment. Furthermore, it is virtually impossible to regularly access all the relevant variables to accurately represent a system as complex as human body. Hence, the development of a parametric model becomes difficult and the need to deal with imprecision arises. This presentation demonstrates how these two aspects are addressed by employing Soft Computing methods in long-term dosing of erythropoietin, a drug which is used for the treatment of patients with chronic anemia. Artificial Neural Network-based adaptive controller for erythropoietin dose adjustment will be presented and evaluated with respect to the currently used protocol. A modified Fuzzy Rule-Based System, capable of handling imprecise input and output information, will be introduced and its application to predict patients' response to erythropoietin will be demonstrated.
Computer and communication technologies are very
rapidly shrinking the world. Manufacturing and supply chains, the service
industry and the infrastructure are becoming highly distributed and also highly
integrated. This creates new challenges from the perspectives of systems,
humans, and cybernetics. Fortunately, there is a significant emergence of
technologies including monitoring and active identification, wired and wireless
communications, and intelligent agent systems. Brought together effectively
these technologies could complement each other and be integrated to provide
unique solutions to meet the needs of decision-makers.
In the past decade a number
of international groups have been formed to develop collaborative agent, holonic,
and multi-agent systems. Within the Intelligent Manufacturing Systems Program,
for example, there have been several such projects. However, many of these
initiatives have already concluded and others are coming to conclusion.
This presentation will
describe a new initiative of the IEEE SMC Society that consists of four core
competency areas and four horizontal market application areas as follows:
We will provide examples of the research challenges
in the core competency and application areas from manufacturing and distributed
energy systems. Because agent communications systems requires a distributed
and scalable hardware infrastructure, we will also describe our research in
distributed wireless 802.11 communication that provides multi-hop, peer to
peer connectivity with adaptive routing and priority scheduling. Finally,
the presentation will identify opportunities for using soft computing techniques
and related technologies in our research projects.
We shall
encourage dialogue and discussions among the participants to explore opportunities
for collaboration.
Variable and feature selection have become the focus
of much research in areas of application for which data sets with tens or
hundreds of thousands of variables are available, particularly in text processing
and medical diagnosis with genomics and proteomics data. The problem consists
in removing the input variables that are either not informative or redundant.
The objectives include improving the performance of predictors, providing
faster and more cost-effective predictors, and providing a better understanding
of the underlying process that generated the data.
We will present
a historical perspective and outline the advantages and limitations of various
approaches. We will show small examples that illustrate particular difficulties
of the problem. Our examples will include DNA microarray analysis and protein
profiling with mass spectrometry. We will explain the design of the NIPS2003
feature selection challenge and analyze the results of the benchmark.
Today, three Laws
are used to explain how the value of a network, comprised of computational
power and informational content, increases as the network expands: Moore's
Law, Metcalf's Law, and Reed's Law. In order to achieve the exponential benefits
promised by the Three Laws, nodes in the value network must possess a shared
communication/information model, and there is a cost to acquiring this knowledge.
In this presentation a new perspective is proposed that explores this effect,
evolves the above three laws to take into account a knowledge-acquisition
cost constant C, and concludes that technologies that bring this constant
as close to zero as possible in real practice are the key to unlocking the
exponential value inherent in information networks.
Development of a
biological model is driven by the experimental context in which it will be
used. Hence, computer models are often overfitted
to a single, unique, experimental context and fail to be useful in other
situations. So doing severely limits the model's
usefulness, effectively blocking inferential extensions to somewhat different
conditions such as a hypothesized new treatment intervention. To solve this problem, multiple, separate models
of a biological system at different levels of organization are required to
understand and adequately represent their behaviors. In this poster, we present
the basics of a new modeling method, FURM (Functional Unit Representation
Method) that attempts to address this problem. Here we focus on the primary
functional unit of the liver within an in silico isolated perfused rat liver
(IS-IPRL). FURM [http://biosystems.ucsf.edu/Researc/furm/index.html]
decouples the various aspects of functional units. It uses a middle-out model
design strategy that enables and encourages selection of different models.
It is an example of a new class of generative biological simulation models
whose components are easily joined and disconnected and are replaceable and
reusable. It works by decentralizing the modeling process without requiring
that all of the data be of a specific type. FURM does not require any particular
formalism. Rather the experimental framework
is formulated using Partially Ordered Sets. We follow four fundamental guidelines: 1) standardize interfaces to multiparadigm, multimode,
and trans-domain models, 2) use discrete interactions, 3) enable knowledge
discovery by designing for an extended model life cycle, and 4) define observables
that will submit to a similarity measure. A data
model represents the biological system. An established in vitro liver perfusion
protocol is the source of our experimental data in [J Pharmacokin Biopharm27:343-82,
1999]. Two in silico system models are implemented. RefModel is the accepted, reference mathematical model [JPET297:780-789,
2001]. ArtModel is our functional unit model
that assumes that liver function, as a whole, is an aggregate of lobule function,
that sinusoids are primarily vascular objects, and that transit time for
perfusate is governed by stochastic interactions between various agents inside
the vascular structures in combination with the perfusion pressure at the
inlet catheter and lobule portal vein. The IS-IPRL strives to replicate
the experimental procedure that has provided the experimental data. The four
primary assumptions are: (1) Outflow profiles alone are lossy projections
of liver behavior. Thus, physiologically accurate models are necessary to
begin fully exploring the liver behavior space. (2)
Hepatic vascular structure and the arrangement of lobules within a lobe can
be represented by a directed graph. (3) The primary
functional unit is the lobule. (4) Outflow for
sucrose, but not metabolized or transported solutes, is solely a function
of the extracellular (vascular cavity) space and its geometry. Within lobules
agents representing sinusoidal segments (SS) are located at each graph node. Agents within each SS represent functionalities within
cellular and subcellular spaces. The explicit
hypothesis being tested is that the selected parameter vectors cause the
model to generate output that is experimentally indistinguishable from that
seen in the in vitro data. A similarity measure
is used to automate the evaluation of the solution sets put forth by the
models. Those results make possible automatic
searches of the parameter space for regions that solve the problem by matching
the in vitro data. To date different parameterizations of a mathematical model
have been needed to account for outflow data for two different solutes. One IS-IPRL model now accounts for the hepatic outflow
profiles of both sucrose and diltiazem and is being extended to account simultaneously
for five additional drugs and to be able to shift to represent either
of two disease states.
One of the biggest
challenges of any control paradigm is being able to handle complex systems
under unforeseen uncertainties. A system may be called complex here if its
dimension (order) is too high and its model (if available) is nonlinear, interconnected,
and information on the system is uncertain such that classical techniques
cannot easily handle the problem. Knowledge about such systems is a key attribute
which is not often exploited for design and synthesis. Soft computing, a
consortium of fuzzy logic, neuro-computing, genetic algorithms and genetic
programming, has proven to be powerful tools for analysis and design of many
complex systems. For such systems the size soft computing control architecture
will be nearly infinite. Examples of complex systems are power networks,
space robotic colonies, national air traffic control system, an integrated
manufacturing plant, Hubble Telescope, a swarm of robotic agents, etc. In
this talk a survey of many soft computing based applications in control,
robotics, simulation and image processing will be given.
In this lecture we present a series of what the speaker considers excellent application opportunities for soft computing tools with national and international implications. Economy and security are two of the most important issue of not just today, but also for tomorrow and years to come. These opportunities are intelligent multi-agent systems (e.g. V-Lab®, presented the day before) and autonomous agents monitoring, recognizance and border security. Another opportunity, not unrelated to national security, is the application of soft computing for energy efficiency of the so-called “Industries of the Future”, where SC tools can be used to save energy, reduce waste or can help avoid production of defective products, reduce energy requirements for automation and remote operations, increase efficiency of existing processes via sensing and information technology systems where demand and automated production modeling can be integrated, etc. Another opportunity exists in the chemical process systems, where relatively little effort has been made to use SC tools. Such problems as water treatment plants, oil extraction, etc. are candidate applications. Finally, two other opportunities are diagnostics/prognostics of national defense systems and image processing applications and remote sensing for the study of earth.
Information Retrieval from the WWW through the use of search engines is known
to be unable to capture effectively the information needs of users. The approach
taken in our work is to add intelligence to information retrieval from the
WWW, by the modelling of users to improve the
interaction between the user and information retrieval systems. In other words, to improve the performance of the
user in retrieving information from the information source. To affect such an improvement, it is necessary that
any retrieval system can somehow make inferences concerning the information
the user might want. The system then can aid
the user, for instance by giving suggestions or by adapting any query based
on predictions furnished by the model.
We have taken two separate
approaches. Firstly, by a combination of user modelling and fuzzy logic a
prototype system has been developed (the Fuzzy Modelling Query Assistant (FMQA)) which modifies a user’s query based on a fuzzy user
model. The FMQA was tested via a user study which clearly indicated that,
for the limited domain chosen, the modified queries are better than those
that are left unmodified.
As part of a European project,
ELVIL, we have developed a Virtual Librarian. The
European Legislative Virtual Library (ELVIL) is an Internet-based portal of
information sources on European law and politics (http://elvil.sub.su.se)
which provides software gateways with a uniform WWW-interface to national
and European parliamentary databases, a searchable index to other sources
of legal and political information on the WWW, and collections of learning
resources on European law and politics. Information Retrieval from the WWW through the use of search engines is known
to be very ineffective at meeting information needs of users. The approach
taken in this work is to add intelligence to
information retrieval from the WWW and online databases, by the modelling
of users, using fuzzy logic, to improve the interaction between the user
and the information retrieval system ELVIL. By a combination of user modelling and fuzzy logic
a system has been developed (the Virtual Librarian (VL)) which takes a fuzzy
user model and search information to choose the most appropriate databases
or web sites within the ELVIL portal.
Problems in bioinformatics and medical decision support are defined
and solved with the use of various soft-computing methods and results are
compared. The problems include: gene expression data analysis for cancer and
other disease profiling; promoter recognition; modeling gene regulatory networks;
integrating gene and clinical data; cardio-vascular risk prognosis; renal
function evaluation; survival prognosis; gene regulatory networks for brain
function analysis. There are five main phases of information processing and
problem solving in most bioinformatics systems that are covered for each of
the above problems, namely: data pre-processing and filtering; feature evaluation
and feature selection; model creation and evaluation; knowledge extraction;
adaptation to new data.
A comprehensive software environment NeuCom (www.theneucom.com)
for data mining, modeling and discovery will be demonstrated and will be made
available to participants. NeuCom includes various methods for data visualization,
feature selection, data analysis, modeling, rule extraction and adaptation.
As a special method for adaptive learning of data and knowledge integration,
evolving connectionist systems (ECOS) are included [1]. ECOS are multi-modular
connectionist architectures that facilitate modeling of evolving processes
and knowledge discovery. It is a collection of
neural networks that operate continuously in time and adapt their structure
and functionality through a continuous interaction with the environment and
with other systems. As many processes in biology
are dynamically evolving, their modeling requires evolving methods and systems,
i.e. requires evolving intelligence. In the area of bioinformatics for example,
new data is being made available with a tremendous speed that would also
require models to be continuously adaptive. The talk adds to the traditional
statistical and artificial intelligence methods the new methods of evolving
systems, so that participants can compare the various approaches and apply
them to their problems.
One of the most important objects in bioinformatics is a gene product (a protein or an RNA). In clustering and subsequent knowledge discovery on unknown gene products, the principal features are the gene sequence and expression values found following a microarray experiment. The question arises as to what is the function of this gene product and is it similar in function or structure to other up-regulated or down-regulated gene products. Many (dis)similarity measures have been proposed to measure closeness of sequences. However, for many gene products, additional functional information comes from the set of Gene Ontology (GO) annotations and the set of journal abstracts related to the gene product. For these genes, it is reasonable to include similarity measures based on the terms found in the GO and/or the index term sets of the related documents (MeSH annotations). In both cases we deal with comparing two sets of terms arranged in a taxonomy (GO or MeSH.). Some measures have been constructed to assess closeness of terms in a taxonomy, including shortest path length between terms and information theory-related values where node probabilities are estimated using a corpus of relevant documents. Utilizing such factors in addition to sequence and expression should aid in the process of knowledge discovery. It will be easier to annotate clusters, for example, when they share common descriptive terms. When an unknown gene product joins the group via sequence and expression, it is reasonable to conjecture that this gene will also share the cluster annotations (at least partially).
In this paper we propose a fuzzy measure-based similarity (FMS) for
computing the similarity of two sets of terms found in a taxonomy (and hence,
the two gene products annotated with terms from the taxonomy). The advantage of FMS is that it takes into consideration
the context of the whole set when computing the similarity. For the case when the objects are annotated to an
ontology, we propose a method of dealing with the case, since it is likely
that two gene products will not be described by identical ontology terms. In addition, we propose a modification of the Resnick
similarity measure such that the similarity between two ontology terms produces
a number in [0,1]. In dealing with large groups
of documents describing the objects under consideration, not only do we determine
the similarity between the document pairs, but, by introducing the Choquet
integral to the scenario, we can fuse this partial agreement function on pairs
of documents into a single value relating the gene products. The measures for the final integral fusion can be
tailored to produce order weighted average (OWA) operators (e.g., at least
two documents must support the connection) or can be based on assessments
of the "worth"; of individual and subsets of documents towards building the
strength of connection. We present examples of
FMS calculation for specific situations where two genes are described by
a set of terms from the Gene Ontology, for two abstracts related to those
genes, and for multiple abstract fusion. We also
compare our measures to others from the literature.
The objective of
our present paper is to derive a fuzzy optimal control model to explicitly
derive the optimal rebalancing weights (i.e. dynamic hedge ratios) to engineer
a structured financial product out of a multi-asset best-of option. The target
function is actually the total cost of hedging (tracking) error over the investment
horizon t = 0 to T, which is assumed to be partly dependent on the governing
Markovian process underlying the individual asset returns and partly on
randomness i.e. pure white noise. To derive the necessary conditions for
the fuzzy optimal control, we consider the problem: Minimize \int_{0}^{T}
f0(r, v, t) dt subject to the condition drj/dt
= fj (r, v, t), where r is the vector of n state variables i.e.
the vector of returns on the n assets underlying the best-of option and v
is the vector of m fuzzy control variables i.e. the values of the replicating
portfolio for m different choices of the dynamic hedge ratios. As the choice
set is fuzzy, the fuzzy control vector is a subnormal fuzzy subset [p1/v1,
p2/v2 ... pn/vn]; pj
is the membership grade of the jth portfolio at time t.
A major aim of bioinformatics is to contribute to the understanding of the
interrelationships between protein sequence, structure and function. In this
context approaches that can suggest structures and processes associated with
sequences may be of special interest. Moreover, if such a structure and a
process turn to work consistently with observations then it is become even
possible to speculate about them as laws of the phenomena. It appears then
naturally to ask why the structure and the process are special to the extent
that they are able to describe the nature of living things. In particular,
whether they are fundamental or it is possible to explain them in terms
of deeper structures and processes. This may explain why irreducible structures
and processes would come to play a special role in the modeling. Indeed,
there can be no further reductions possible in their instances only.
In the paper we present an approach within which
sequences become associated with a hierarchical structure that has integer
relations as the elements. The ultimate building blocks of the structure are
the integers. The other elements are made of them by an organizing principle.
The structure is irreducible in the sense that its existence is based on the
integers only and the other elements are made in the full control of arithmetic.
This reduces the explanations about the elements, and why it is they are
made the way they are, to explanations that reveal why it is integers exist
and arithmetic holds the way it does.
A new type of process can be revealed
as the result of an interpretation of the relationships existing between the
elements of the structure. In particular, the relationships between the integer
relations are specified the organizing principle, which describes how an
element of one level of the structure can be produced from elements of the
previous one. Notably, this may be interpreted as a formation process. The
structure has a natural ability to integrate processes. This may be helpful
to model how the function of protein sequences as a new whole emerges from
the functions of these sequences as separate entities.
In general, the
approach is contrasting, because it relies on structures, with integer relations
as the elements, in order to model structures and functions of protein sequences.
Although the structures and the processes are rigorously defined mathematically,
in computations however they resist to be easily described and processed.
In the talk we discuss that fuzzy logic can propose an efficient solution
to this problem.
Recently,
Lotfi Zadeh suggested that the development of a search engine with the ability
to synthesize an answer to a query from different information elements would
be required. In the search the information elements are specified by the query
and their integration into an aggregated information element serves to provide
a basis for the answer.
In general, the problem of information
integration is very challenging and remains unresolved. In the Internet search
context it may be characterized by a huge number of information elements involved.
Therefore, an identification and understanding of possible patterns related
to the information integration may turn to be useful in the development of
a search engine.
In the paper we study this problem in
terms of the eigenvalue spectra of the variance-covariance matrices. They
are applied to binary sequences as information elements to model the information
integration. The eigenvalue spectrum of the variance-covariance matrix is
used to investigate the result of the information integration. It is suggested
that some patterns of the eigenvalue spectra may be helpful in the understanding
of the information integration.
We demonstrate that there exists a certain
pattern dynamics of the eigenvalues within which the spectra of the variance-covariance
matrices may be described. We reveal a mechanism that underlies the eigenvalue
dynamics and show that its functioning can find efficient computational expression
by a quadratic trace of the variance-covariance matrix. It is proposed how
the pattern of the eigenvalue spectra, specified by the dynamics and the mechanism,
may be interpreted to help us in the understanding of the information integration.
It is also discussed
that the fuzzyfication of the eigenvalues has played a crucial role in the
identification of the dynamics and the mechanism. This may support that fuzziness
is an integral part of the information integration.
To a great extent the future welfare of the world
depends on developing new materials and processes. In all fields of endeavour
there is need for materials with new properties that support the necessary
advances. These fields encompass among others: robotics, computer hardware/software,
automobiles and transportation, communication systems, medicine and biological
systems, environmental protection, energy production and distribution, aircraft
and aerospace, rapid transit systems and railroads, undersea exploration,
building construction and other structures, missiles and weapon systems, commercial
"white" products, etc. As each of these fields
evolve into new areas of science and technology, demands are made for increased
speed, increased productivity or bandwidth, increased storage capacities,
increased strength or mechanical resistance, increased efficiency or effectiveness,
adaptability to increased complexity, decreased size, decreased pollution
levels and emissions, decreased costs, decreased cycle times, integration
of existing processes, and doing this while encorporating "intelligence"
within specific materials, processes or products.
This paper
reviews some recent advances in the application of "intelligent" methods in
the production of new materials and manufacture of new products. In some cases,
the material embodies its "SMARTness" into an ability to self-adapt or respond
to environmental changes. In other cases, the intellect is embodied into
the processes used to derive the materials - primary processing, extraction,
finishing, assembling, and delivery. Finally the use of "intelligence" to
perform or support simulation models of new materials and/or their processes
is presented. Both empirical data-derived models as well as First Principles
modeling are discussed.
Image analysis techniques
have been broadly used in computer-aided medical analysis and diagnosis in
recent years. Computer aided image analysis is an increasingly popular tool
in medical research and practice, especially with the increase of medical
images in modality, amount, size and dimension. Image segmentation, a process
that aims at identifying and separating regions of interests from an image,
is crucial in many medical applications such as localizing pathological regions,
providing objective quantitative assessment and monitoring of the onset and
progression of the diseases, as well as analysis of anatomical structures. For clinical applications of segmentation,
a compromise between the accuracy and computational speed of segmentation
techniques is needed. Optimal segmentation processes
based on statistical and adaptive approaches and their applicability to clinical
settings will be addressed using diverse modalities of images. Current drawbacks of automated segmentation methodologies
stem mostly from non-uniform illumination, inhomogeneous structures, and
the presence of noise in acquired images. The
effect of preprocessing on the accuracy of segmentation will be discussed. The superior performance of advanced clustering algorithms
based on statistical and adaptive approaches over traditional algorithms
in medical image segmentation will be presented.
Common sense causal reasoning
occupies a central position in human reasoning. It plays
an essential role in human decision-making. Considerable
effort has been spent examining causation. Philosophers, mathematicians,
computer scientists, cognitive scientists, psychologists, and others have
formally explored questions of causation beginning at least three
thousand years ago with the Greeks.
Whether causality can be recognized at all has long been a theoretical
speculation of scientists and philosophers. At the same time, in our daily
lives, we operate on the commonsense belief that causality exists.
Causal relationships exist in the commonsense world. If an automobile
fails to stop at a red light and there is an accident, it can be said that
the failure to stop was the accident's cause. However, conversely, failing
to stop at a red light is not a certain cause of a fatal accident; sometimes
no accident of any kind occurs. So, it can be said that knowledge of some
causal effects is imprecise. Perhaps, complete knowledge of all possible factors
might lead to a crisp description of whether a causal effect will occur.
However, in our commonsense world, it is unlikely that all possible factors
can be known. What is needed is a method to model imprecise causal models.
Another way to think of causal relationships is counterfactually.
For example, if a driver dies in an accident, it might be said that had the
accident not occurred; they would still be alive.
Our common sense
understanding of the world tells us that we have to deal with imprecision,
uncertainty and imperfect knowledge. This is also the case of our scientific
knowledge of the world. Clearly, we need an algorithmic way of handling imprecision
if we are to computationally handle causality. Models are needed to algorithmically
consider causes. These models may be symbolic or graphic. A difficulty is
striking a good balance between precise formalism and commonsense imprecise
reality.
The objective of our present paper is to derive a computationally efficient genetic pattern learning algorithm to evolutionarily derive the optimal rebalancing weights (i.e. dynamic hedge ratios) to engineer a structured financial product out of a multi-asset, best-of option. The stochastic target function is formulated as an expected squared cost of hedging (tracking) error which is assumed to be partly dependent on the governing Markovian process underlying the individual asset returns and partly on randomness i.e. pure white noise. A simple haploid genetic algorithm is advanced as an alternative numerical scheme, which is deemed to be computationally more efficient than numerically deriving an explicit solution to the formulated optimization model. An extension to our proposed scheme is suggested by means of adapting the Genetic Algorithm parameters based on fuzzy logic controllers.
Intelligent Web personalization
aims at adapting a user's interaction with the Web information space based
on information gathered about the user. A complete automated Web personalization
system is generally based on Web usage mining to discover useful
knowledge about user access patterns, followed by a recommendation
system to act on this knowledge in order to respond to the users'
individual interest, in a manner transparent to the user, and while
protecting the user's privacy and anonymity.
Web usage mining has recently attracted attention
as a viable framework for extracting useful access pattern information, such
as user profiles, from massive amounts of Web log data for the purpose of
Web site personalization and organization. These efforts have relied mainly
on clustering or association rule discovery as the enabling data mining technologies.
Typically, data mining has to be completely re-applied periodically and offline
on newly generated Web server logs in order to keep the discovered knowledge
up to date. Also, to date, the clustering techniques that have been used for
Web usage mining have rather scored low on scalability both in terms of time
and memory requirements, and simply cannot be expected to keep up with the
huge flux of Web clickstream data on today's busy websites. In addition to
their lack of scalability and difficulty to adapt in the face of continuously
evolving patterns, current clustering techniques, such as most KMeans variants,
also suffer from one or more of the following limitations: requirement of
the specification of the correct number of clusters/profiles in advance, sensitivity
to initialization, sensitivity to the presence of noise and outliers in the
data. Other techniques relying on association rule discovery suffer from
their sensitivity to various thresholds such as support and confidence, as
well as sparsity of the data. Hence, there is a crucial need for scalable,
noise insensitive, initialization independent techniques that can continuously
discover possibly changing/evolving Web user profiles without any stoppages
or reconfigurations.
We overview some recent evolutionary computation
techniques for Web mining. Then, we present a new scalable clustering methodology
that gleams inspiration from the natural immune system to be able to continuously
learn and adapt to new incoming patterns. The Web server plays the role of
the human body, and the incoming requests play the role of foreign antigens/bacteria/viruses
that need to be detected/recognized by the proposed immune based clustering
technique. Hence, our clustering algorithm plays the role of the cognitive
agent of an artificial immune system, whose goal is to continuously perform
an intelligent organization of the incoming noisy data into clusters/patterns.
Our approach exhibits superior learning abilities, while at the same time,
requiring modest memory and computational costs. Like the natural immune
system, the strongest advantage of immune based learning compared to current
approaches is expected to be its ease of adaptation to the dynamic environment
that characterizes several applications, particularly in mining data streams.
We illustrate the ability of the proposed approach in detecting clusters
in noisy data sets, and in mining user profiles from Web clickstream data
in a single pass under different usage trend sequence scenarios.
Finally, we present several soft computing approaches
to high performance mass profile-based web recommender engines that we have
recently developed.
Acknowledgements:
This work is supported by a
National Science Foundation CAREER Award (NSF-IIS-0133948) to O. Nasraoui.
I will present work that
represents a departure from the master-slave metaphor of human-system interaction
--- in which a user first (precisely) communicates a request that is then
followed by a response --- with one based on collaboration in which a user
and an application work jointly toward some goal. Such collaboration
leads to a somewhat different view of communication, particularly in information-gathering
tasks such as those prevalent on the web. In such tasks, the fruits
of collaborations take the form of the establishment of what we call an access
perspective. An access perspective reflects the inherently multifaceted
way in information at one location may be accessed from another, depending
on the intended use of that information and the required level of detail.
Access perspectives are dynamic entities that are informed by the structure
of a user's activity. Conventional, nonadaptive user interfaces characteristically
provide a static access method. I will describe methods for representing
a context tree which establishes a particular access perspective; as a user
explores a particular branch of a context tree, the context becomes more
and more constrained and the access perspective becomes, correspondingly,
narrower.
In this talk we shall discuss the use of graph
data management for a variety of biological applications. Simple graphs
are composed of a set of nodes and a binary relation comprising the edges
which connect pairs of nodes.
We shall emphasize applications related to the
representation and querying of biopathways databases, e.g., metabolic pathways,
signal transduction pathways, and genetic regulatory networks. Other potential
applications of graph data management to biology include: chemical structure
graphs, protein interaction networks, phylogenetic trees, taxonomies of chemicals,
proteins, enzymes and diseases, partonomies (e.g., in anatomy), topological
adjacency relations (in image analysis), contact graphs (for 3D protein structure),
bibliographic citation graphs, food webs, biogeochemical cycles, gene clusterings,
partial order graphs for DNA multiple sequence alignments, genetic maps,
operon and regulon structures, sequence overlap graphs for shotgun DNA sequencing,
database schemas and mappings among schemas, data provenance (lineage), hypertext,
semantic web applications, laboratory protocols, etc.
Graph data models for biology come in a number
of variants: undirected and directed graphs, simple graphs, nested graphs,
multigraphs, and hypergraphs. We will mention these variants and illustrate
their applications.
Graph data management offers two major advantages
for biopathways applications: naturalness of representation of pathway data,
and ease of querying pathway data. It is the latter issue which is more
important.
Graph data management systems permit users to
frame queries in terms of graph operations, e.g., subgraph isomorphism, shortest
paths, etc. which would be difficult to express or compute in conventional
(relational) DBMS systems. We discuss a number of graph queries in the talk,
e.g., subgraph homomorphism queries.
Graph data management systems typically treat
individual fragments of the database more homogeneously (as either nodes or
edges) than relational databases which partition the database into many specialized
relations. In a GDMS the analog of relation structure is encoded as edges
which indicate types of nodes. While the relational storage structures offer
advantages in performance on fixed structure queries, the homogeneous graph
data model is much easier to use in posing queries which allow paths to span
many different possible relations (node types). It is this storage homogeneity
which facilitates pattern matching and path queries in graph databases. In
contrast, similar queries in relational DBMS involve large numbers of union
queries over the various possible relations which might participate in a
path or subgraph pattern match. We will discuss this issue and (briefly) some
related comparisons to logic and object oriented database management systems.
We will illustrate the talk with references to
some major biopathways databases. Time permitting we will also mention the
role of RDF (Resource Description Framework) as a a graph data model. We
will conclude with a brief survey of some alternative approaches to implementation
of graph database management systems.
This is joint work with Kevin D. Keck and Vijaya Natarajan (both at LBNL). Further information on the Biopathways Graph Data Manager Project may be found at http://www.lbl.gov/~olken/graphdm/graphdm.htm The work is funded by DARPA Biocomp Program (via the Biospice Project at LBNL (PI: A. Arkin)), and DOE Genomes to Life Program (via VIMSS GTL project at LBNL (PIs: A. Arkin and T. Hazen) and Synechococcus GTL project at Sandia National Lab (PI: G. Heffelfinger, LBNL PI: A. Shoshani). Submitted by Frank Olken on 2003-12-08.
Emergence of data
mining and knowledge discovery from pattern recognition point of view is illustrated. Significance of integrating various
soft computing tools
for efficient learning is described. Role of granular computing in data mining
is given more emphasis. Two examples, demonstrating
integrations of fuzzy sets, artificial neural networks, genetic algorithms
and rough sets for efficient classification, rule generation and rule evaluation, and for granular case generation
in case based reasoning problems, are provided along with their application
specific merits on real life data. The significance of rough-fuzzy granulation
for both reducing the computation time and improving the performance is explained.
The talk concludes explaining the relation of
rough-fuzzy case generation with the recently
emerged computational theory of perception (CTP)
and their applications in Web mining problems.
A Small Survey of the Evolution of Logics from Boolean to fuzzy and neutrosophic is presented. Afterwards the neutrosophic logic components are
introduced followed by the definition of neutrosophic logic (NL), based on
non-standard analysis, and neutrosophic logic connectors which are based
on set operations.
Neutrosophic Logic Components:
Let T, I, F be standard or non-standard real subsets of ] -0, 1+ [,
with sup T = t_sup, inf T = t_inf,The sets T, I, F are not necessarily intervals,
but may be any real sub-unitary subsets: discrete
or continuous; single-element, finite, or (countably or uncountably) infinite;
union or intersection of various subsets; etc. They may also overlap. The real subsets could represent the relative errors
in determining t, i, f (in the case when the subsets T, I, F are reduced
to points). In the paper, T, I, F, called neutrosophic components,
will represent the truth value, indeterminacy value, and falsehood value respectively
referring to neutrosophy, neutrosophic logic, neutrosophic set, neutrosophic
probability, neutrosophic statistics. This representation is closer to the
human mind reasoning. It characterizes/catches
the imprecision of knowledge or linguistic inexactitude
received by various observers (that's why T, I, F are subsets - not necessarily
single-elements), uncertainty due to incomplete knowledge
or acquisition errors or stochasticity (that's why the subset I exists),
and vagueness due to lack of clear contours or limits (that's
why T, I, F are subsets and I exists; in particular for the appurtenance
to the neutrosophic sets). One has to specify the superior (x_sup) and inferior
(x_inf) limits of the subsets because in many problems arises the necessity
to compute them.
Definition of Neutrosophic Logic: A logic in which each proposition is estimated to have the percentage of truth in a subset T, the percentage of indeterminacy in a subset I, and the percentage of falsity in a subset F, where T, I, F are defined above, is called Neutrosophic Logic. We use a subset of truth (or indeterminacy, or falsity), instead of a number only, because in many cases we are not able to exactly determine the percentages of truth and of falsity but to approximate them: for example a proposition is between 0.3-0.4 true and between 0.6-0.7 false, even more imprecise: between 0.3-0.4 or 0.45-0.50 true (according to various analyzers), and 0.6 or between 0.66-0.70 false. The subsets are not necessary intervals, but any sets (discrete, continuous, open or closed or half-open/half-closed interval, intersections or unions of the previous sets, etc.) in accordance with the given proposition.
A subset may have one element only in special cases of this logic.
The differences between IFL and NL [and the corresponding Intuitionistic Fuzzy Set (IFS) and Neutrosophic Set (NS)] are:
a) Neutrosophic Logic can distinguish between absolute truth (truth in all possible worlds, according to Leibniz) and relative truth (truth in at least one world), because NL(absolute truth)=1+ while NL(relative truth)=1. This has application in philosophy (see the neutrosophy). That's why the unitary standard interval [0, 1] used in IFL has been extended to the unitary non-standard interval ]-0, 1+[ in NL. Similar distinctions for absolute or relative falsehood, and absolute or relative indeterminacy are allowed in NL.
b) In NL there is no restriction on T, I, F other than they are subsets
of ]-0, 1+[, thus: -0[inf T + inf I + inf F[ sup T +
sup I + sup F [ 3+. This non-restriction allows
paraconsistent, dialetheist, and incomplete information to be characterized
in NL {i.e. the sum of all three components if they are defined as points,
or sum of superior limits of all three components if they are defined as subsets
can be >1 (for paraconsistent information coming from different sources)
or < 1 for incomplete information}, while that information can not be
described in IFL because in IFL the components T (truth), I (indeterminacy),
F (falsehood) are restricted either to t+i+f=1 or to t2 + f2
[1, if T,
I, F are all reduced to the points t, i, f respectively, or to sup T + sup
I + sup F = 1 if T, I, F are subsets of [0, 1].
c) In NL the components T, I, F can also be non-standard subsets included in the unitary non-standard interval ]-0, 1+[, not only standard subsets included in the unitary standard interval [0, 1] as in IFL.
d) NL, like dialetheism, can describe paradoxes, NL(paradox) = (1, I, 1), while IFL can not describe a paradox because the sum of components should be 1 in IFL.
Applications of Fuzzy and Neutrosophic Logics: Further one gives examples of fuzzy and neutrosophic logics used in the reconciliation between theoretical and market prices of long-term options contracts, in extension of the MASS model as a cost-optimal relative allocation of facilities technique by the incorporation of neutrosophic statistics and the DSm (Dezert-Smarandache) combination rule, and in conditional probability of actually detecting a financial fraud – a neutrosophic extension to the application of Benford's first-digit law.
Original work consists
in the definition of neutrosophic logic (NL) and neutrosophic logical connectors
as an extension of intuitionistic fuzzy logic (IFL) and the comparison among
NL and other logics, especially the IFL, and in their applications.
This article describes main ideas of Information Monitoring Systems (IMS) and applications of IMS in real-world problems. Information monitoring systems relate to a class of hierarchical fuzzy discrete dynamic systems. The theoretical base of such class of systems is made by the fuzzy sets theory, discrete mathematics, and methods of the analysis of hierarchies which was developed independently in works of Zadeh, Messarovich, Saaty and others. IMS address to process uniformly diverse, multi-level, fragmentary, unreliable, and varying in time information about some problem/process. Based on this type of information IMS allow perform monitoring of the problem's/process' evolution and work out strategic plans of problem/process development. These capabilities open a broad area of applications in business (marketing, management, strategic planning), socio-political problems (elections, control of bilateral and multilateral agreements, terrorism), etc. One of such applications is a system for monitoring and evaluation of state's nuclear activities (department of safeguards, IAEA) has been shortly described in the report.
The structure and precision
of the data affects the type of relationships that can be analyzed by knowledge
discovery and data mining algorithms. Data items
are frequently sets, tuples in relational databases, or elements from a temporal
stream. The former two representations consider
individual data items as independent while temporality induces an ordering
on the data. When data items are independent, data mining algorithms seek
to discover frequently recurring associations within individual items. When
distinct data items are related, the task of knowledge discovery is to identify
relationships between the items.
In this presentation we review
the types of relationships produced by standard data mining techniques. In particular we are interested in the representation
of imprecision in both the data and the resulting associations. For independent data, fuzzy sets have been used to
partition attribute domains to extend quantitative association rules to
imprecise categories. Temporal relationships
consider data from one or multiple sources in which the sole link between
the data may be the time value. Association rules
for temporal data may include imprecise durations and temporal constraints. Temporal durations specify the time an event is required
to continue to satisfy a proposition, for example event A occurs for
"a long period" of time. A constraint specifies
the temporal relationship between the occurrence of distinct events; B
occurs shortly after A. We will
discuss the constraints, data representations, and search strategy needed
for analyzing fuzzy temporal associations in data from multiple sources.
Today, much of product feedback
is provided by customers/critiques online through websites, discussion boards,
mailing lists, and blogs. When trying to make strategic decisions (a product
launch, a purchase), using a websearch will return many useful but heterogeneous
and, increasingly, multilingual opinions on a product. Generally, the user
will find it very difficult and time consuming to assimilate all available
information and make an informed decision. To date most work in automating
this process has focused on a monolingual setting. Existing monolingual approaches
will overviewed. In addition, I will describe our preliminary work on mining
product ratings in a multilingual setting. The proposed approaches are automatic,
using a combination of techniques from classification and translation, thereby
alleviating human-intensive construction and maintenance of linguistic resources.
Bioinformatics is about providing biologists
and those in allied disciplines with ability to exploit information - information
that is increasingly distributed, heterogeneous and massive. Bioinformatics has already succeeded in utilizing
database management, workflow and information retrieval technologies, which
have provided syntactic search, heterogeneous data access and sharing, and
limited forms of integration. A good amount of effort has also involved use
of statistical and syntactic techniques to support the essential tasks of
finding patterns, similarities, and matches to identify building block structures.
Looking to the future, we can realize more exciting potential of bioinformatics
if we have more automated ways for analysis leading to insight and discovery
- to understand cellular components, molecular functions and biological processes,
and more importantly complex interactions and interdependencies between them.
And while lot of effort in the last decade focused on genes, next set of challenges
involve more complex structures of protein and carbohydrates.
This talk focuses on semantics enabled bioinformatics. We outline increasing use of semantic techniques in bioinformatics
for search, browsing, integration, analytics and discovery. Ontologies provide underpinning for most of today's
semantic techniques and the Semantic Web research, and bioinformatics is
one of the most aggressive adopters of ontologies among science and industry
domains. We provide examples of use of multiple
ontologies and Semantic Web Processes to investigate more automation in discovery
and analysis. We will also weave a brief overview of the research and exciting
commercial state of the art in the semantic technology (specifically ontology
driven information systems).
Sample background/related material:
Supercomputing today in
2003/2004 is no longer based on custom built expensive vector supercomputers
which were associated with the field more than a decade ago. Many of the high-end
computers today are built from inexpensive commodity parts, using open source
software. Consequently the application of this high-end computing technology
to floating-point intensive applications in science and engineering has become
a dynamic and vibrant field. Clusters and related technology are used everywhere
from science and engineering departments in universities, to commercial applications
in banking, telecoms, and biotech. Surprisingly the field of soft computing
as represented by the topics of this workshop, has not been a strong participant
in this revolution of high end computing. With some notable exceptions such
as Google's search engine that runs on commodity clusters, parallel supercomputing
and soft computing seem to be living in parallel universes.
In this talk, I will first survey the state of the field in supercomputing, and analyze current developments based on data from the TOP500 list. I will demonstrate with a few examples from NERSC's users community, how powerful compute resources have transformed science, and how computational science has become accepted as the third leg of science in many disciplines. Then I will offer an analysis, why these developments have had little impact on computational problems in soft computing. Based on this analysis, I will conclude with some suggestions on how both fields could interact and learn from each other in the future.
Advances in AI and Soft Computing are key to
realizing the visions of the intelligent enterprise, iBusiness (intelligent
e-Business), and web intelligence. BT Exact's
Research and Venturing division is driven by applications and development
of these key technologies for business solutions. This
presentation describes projects in the Computational Intelligence Group,
which focuses on software systems and tools for intelligent data analysis
and information management. In particular, we
discuss the use of soft computing for software agents and adaptive user profiling,
data representation, information search and retrieval, document classification,
text mining, and intelligent scheduling. Our
goals are to increase automated reasoning, improve resource/information utilization,
and assist human-machine interaction in the realm of natural language/unstructured
data. The systems are targeted at both internal
customers (e.g. Contact Centers), and external users.
The cDNA and Oligonucleotide microarray chips have
become the most popular techniques for investigating gene expressions profiles.
The high density DNA microarray technology can simultaneously monitor the
expression levels of thousands of genes. How to analyze the huge experimental
result and discover useful biological knowledge is an important topic. In
this research we present a framework, namely GeneFilter, for effectively analyzing
gene expression data and performing knowledge discovery. The main goals of
the framework are as follows: 1) Design an effective model and flow for high-throughput
data analysis, 2) Develop an integrated and efficient platform for gene
expression analysis including data preprocessing, gene expression patterns
mining and visualization modules, 3) Develop methodologies for biological
knowledge discovery based on the gene expression analysis results.
The system architecture of GeneFilter is as shown in Figure 1. In the wet-Lab,
biologists design the experiments for targeted diseases and conduct experiments
with cDNA microarray, oligonucleotide microarray, Q-PCR, or 2D gel chip. The
experimental results are then submitted to GeneFilter for the Dry-Lab analysis,
including data preprocessing, statistical analysis, gene expression patterns
mining, gene ranking and gene ontology analysis. Finally, the analysis results
are validated by using more precise biological experiments like Q-PCR, and
the validation results are feed back for refining the further analysis. In
the following, we describe the main functions in each of the analysis process:
1. Data Preprocessing and Statistical Analysis
In this process, the quality of the microarray data is first examined via
various statistical methods and visualization techniques like in [1]. The
data of bad quality will be filtered out eventually. Then, various types
of normalization methods, like Within-slide normalization, Paired-slides
normalization, Multiple-slides normalization [2, 3], are provided for applications
on the data. Various statistical graphs like MA Plots are also provided for
examining the normalized results.
2. Gene Expression Patterns Mining
In this process, the interesting patterns hidden behind the microarray
data are discovered automatically via the following data mining methods we
developed based on our past experiences [7, 8, 9]: 1) Automatic and customized
discovery of gene expression patterns (e.g., the up-regulated or down-regulated
expression patterns), 2) Validation-based clustering for finding the nearly-optimal
clustering of the gene expression in very short time, 3) Time-series analysis
for discovering the activating relations between genes with tolerance of noisy
data, 4) Classification analysis for modeling or contrasting gene expression
patterns under different disease types (e.g., the dominating genes related
to bladder cancer).
3. Gene Ranking<:p>
For the genes identified to be with specific patterns we are interested via the data mining methods, a ranking mechanism is further applied for distinguishing the possible degree of importance of these genes. The ranking mechanism takes as inputs various kinds of information and then calculates a score for each gene based on the given weights for each type of information. The input information includes the significant amount of gene expression ratio value in the microarray, the degree of relevant to the targets based on the published literatures (e.g., Gene Expression Omnibus (GEO) web site (http://www.ncbi.nlm.nih.gov/geo/ and [5]) releases the genes related to various kinds of cancers), Q-PCR results, etc.
4. Gene Ontology Analysis
For the interesting genes discovered through the above processes, three organizing networks of Gene Ontology (http://www.geneontology.org/
) information, namely molecular function, biological process and cellular
component, are provided for exploring
the deeper knowledge on the genes. In
GeneFilter platform, an agent program
was designed for accessing the latest Gene Ontology information, and a querying
system was provided for users to discover the
Gene Ontology information related to
specific genes
5. Analysis Validations and Cyclic Refinement
Through the above analysis process, the most interesting genes related
to the analysis targets will be discovered in an automatic and efficient
way. To validate the analysis results, further biological experiments should
be conducted via more precise biological experiments like Q-PCR [6]. The
validation results will be feed back into the analysis platform again for
refining the analysis by adjusting the policies and parameters in data preprocessing,
expression pattern mining and gene ranking. Through the cyclic refinement
process, more and more accurate analysis results will be obtained.
In real applications, we have conducted an extensive analysis of bladder
cancer microarray on GeneFilter platform for discovering the genes most influential
on the progress on different stages of bladder cancer. Multiple slides of
microarray with more than 10,000 genes are analyzed and genes with interesting
expression patterns (e.g., up-regulated) are discovered successfully. In particular,
the whole analysis process can be done within very short period due to the
nice features of the platform in terms of high degree of integration and
automation.
References
[1] S. Dudoit, Y. H. Yang, T. P. Speed, and M. J. Callow, "Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments" in Statistica Sinica, Vol. 12, No. 1, p. 111-139, 2002.
[2] Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed, "Normalization for cDNA Microarray Data," in SPIE BiOS, San Jose, California, January 2001.
[3] X. Wang, M. J. Hessner, Y. Wu, N. Pati, S. Ghosh, "Quantitative quality control in microarray experiments and the application in data filtering, normalization and false positive rate prediction," in Bioinformatics, 19(0): 1-7, 2003.
[4] Anil K. Jain & Richard C. Dubes (1988) Algorithms for Clustering Data. Prentice Hall.
[5] Jung-Hsien Chiang and Hsu-Chun Yu, "MeKE: discovering the functions of gene products from biomedical literature via sentence alignment," Bioinformatics 2003 19: 1417-1422.
[6] M.S. Rajeevan, Vernon S.D., Taysavang N., and Unger E.R. (2001) Validation of Array-Based Gene Expression Profiles by Real-time (Kinetic) RT-PCR. J. Mol. Diagnostics 3(1):26-30.
[7] S.M. Tseng and L. J. Chen, "An Empirical Study of the Validity of Gene Expression Clustering," in Proc. 2002 Int'l Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS), Las Vegas, USA, 2002.
[8] S.M. Tseng and Ching-Ping Kao, "Mining and Validation of Gene Expression Patterns: An integrated approach and applications," in Informatica, vol. 27, no. 1, pp. 21-27, 2003.
[9] S. M. Tseng and Y. L. Chen, "An Effective Approach for Resolving Fundamental Problems in Time-Series Gene Expression Mining", in Proc. Int'l Workshop on Foundation and Direction in Data Mining (held with IEEE ICDM), USA, Nov., 2003.
We review briefly
the basic components of the philosophical grounding of fuzzy theory from
ontology to epistemology to applications. Within the context of epistemology,
the notion of “belief” plays an essential role in foundationalism. In order
to generalize such foundational concerns of epistemology, we propose that
it be re-structured to include the language of fuzzy theory. Thus “Precisiated
Natural Language, PNL” proposed by Lotfi A. Zadeh ought to be considered the
sixth link in the evolutionary chain of scientific languages where meta-languages,
speech, writing, mathematics, and computing languages represent the usual
evolutionary links. It is well understood that all forms of language have
both a communications and informatics dimension that facilitates human thought
and decision making and therefore have an impact on
computing with words and perceptions. I this regards, we review the
connection between “belief” and “fuzzy sets” as well as classical probability
and fuzzy sets.
Multi-agent
systems are one of the most exciting research areas in computer science
at the moment. In the last years there has been a growing interest in the
application of agent-based systems in health care and especially in medical
diagnosis. This stems mainly from the fact that in today’s global world a
fast and reliable medical diagnosis generation is of eminent importance as
can be seen, for example, from the recent problems with SARS. Such highly
contagious and lethal diseases can threaten the population if they are not
fought immediately and with high efficiency. However, medical diagnosis can
be a complex process with a lot of uncertainty and fuzziness. Thus, the integration
of multi-agent system technology with soft computing technologies seems to
be very promising, especially, since soft computing is tolerant of imprecision,
uncertainty and partial truth. In this talk we will discuss whether and how
multi-agent system technology can be used to improve and support the medical
diagnosis process, especially, in situations when there is high uncertainty
about the right diagnosis. Moreover, it will be shown in what way soft computing
technologies can contribute to a further significant improvement of the diagnosis
finding process and everything else related to it.
We propose a fuzzy-evolutionary approach
to self-organization that emulates social behavior and immunity in Cyberspace
and has applications to emergency response management (Fig. 1) distributed
manufacturing, medical informatics and Cybersecurities. Organized
in a nested hierarchy (Fig. 2) distributed throughout the network the system
consists of a hybrid mixture of static and mobile agents behaving like a Cyberorganism capable to react to unexpected changes / attacks
in an optimal manner. Computational intelligence techniques endow the MAS
with learning and discovery capabilities. By ‘cloning’ real-life entities
into software agents, the proposed paradigm can be easily extended to the
creation of emergent dynamic information infrastructures that are autonomous
and proactive, capable of ensuring ubiquitous (optimal) resource discovery
and allocation while at the same time self-organizing their resources to optimally
accomplish the desired objectives.
Fig. 1: Emergency Scenario
Fig. 2: Holonic System
This decade will probably be remembered as the “genome decade”. Almost a dozen microorganism sequences, including bacteria, and genomic sequences for man, mouse, etc. have already been completed. However, the post genome sequencing era has just begun. The focus now is on uncovering the functional organization of cells. The systemic global studies of gene expression and DNA-protein interactions in different conditions is a topic of considerable interest.
There are numerous modeling approaches of genetic networks already being proposed, but the Boolean network model discovered about three decades ago remains quite strong among researchers in the field. In this paper, the “fuzzy logic network” FLN, is proposed as a viable model for genetic networking studies. The basic properties and characteristics of the FLN are presented in this paper. The Boolean network model can be shown to be a special case of the FLN. As pointed out by numerous researchers, the Boolean network is only a coarse-grained and symbolic modeling despite the faithful modeling of the gene self-regulating process. FLN not only provides more qualitative information, but also provides some essential quantitative information about the self-regulating mechanism of the genes in the network.
One important aspect of the improvement is the possible control of the genetic network. A node may be added to a graph which represents the genetic network. Mathematically, this node represents a biological stimulus where a stimulus is any relevant physical or chemical factor which influences the network and is itself not a gene of gene product. In system and control literatures, this node is called an input function or excitation function.
Theoretically speaking, we are able to show that this control node, under certain constraints, is capable of regulating the genetic network, which is already self-regulating. To say the very least, this theoretical result may prove to be useful in experimental design. This control strategy may potentially be useful in obtaining desired “target network states”. Achieving these preferred “target network states” may have significance in cell transformation, cell repair or drug target design.
Much more remains to be explored and we hope the fuzzy logic network will be eventually be accepted as a viable model in the genomic research community. This possibility certainly exists due to FLN’s better fit with the reality in biology as well as its simplicity in computations.
Pruning of multi input/output neural networks is discussed and a pruning algorithm called CSDF is described. CSDF acts to induce internal models as a result of redundancy elimination and selective bindings. CSDF is used in a new ICA method based on an auto-encoder performing sensor-signal identity mapping. An internal model of the external signalmixing situation emerges due to the CSDF pruning, and the hidden units that survive the CSDF pruning reconstruct the blind source signals. This ICA method which requires no pre-processing such as whitening is characterized by its high adaptability and robustness, as is demonstrated by trouble cases such as sudden increase of the source signals, sudden failure of sensors and so on. As another example, CSDF is applied in a neural network for analogical learning/inference. Internal abstraction models together with abstraction/de-abstraction bindings are generated as a result of the CSDF structural learning coupled with the backpropagation training. The internal abstraction model acts as an attractor for new relevant dataset, a process corresponding to analogical memory retrieval.
Finding the desired information on the World Wide Web is not an easy task because the information available on the WWW is inherently unordered, distributed, and heterogeneous. As a result, the ability to search and retrieve information from the Web efficiently and effectively is a key technology for realizing the Web’s full potential
Expressing a search request is the first thing we need to solve. At almost all search scenarios, the desired information is on some Web pages. For some similar search requests, the desired Web pages share some similar content characteristics and/or structure characteristics. Traditional search methods let users to submit "keywords" that may be displayed on the desired Web pages to express their search requests. So we call them “keyword-based search” or “content-based search”. Obviously, they are not so powerful. There are 2 problems: 1 in many cases, a search request is inherently fuzzy and thus difficult or even impossible to be expressed by "crisp" keywords. Furthermore, many “partial related” Web pages can not be retrieved in a crisp way. And 2 in many, if not all, cases, some keywords provide some "structure characteristics" while others only provide "content characteristics". Unfortunately, traditional search methods do not differentiate the two kinds of keywords. In the following, “keywords” will be referred as those words that only provide content characteristics.
Retrieving information effectively is another issue we need to think about. Current search engines are known for poor accuracy: they have both low recall and low precision. Furthermore, the most relevant/desired documents are not always displayed at the top of the query result list. It means the query result is not listed in a desired order.
We proposed an intelligent
web information search and retrieval model called Web Information Search
Task (WIST). WIST model has two goals: one is to make the interface of a
search service more expressive and another is to make information retrieval
more effective. Many search requests have different content characteristics
but share similar structure characteristics. These structure characteristics
are expressed by simple structure
rules. Basically, a
structure rule is an IF-THEN formula defined on web pages’ URL, Title, Text,
Input Links, Output Links, or other related sections. If the IF part is satisfied
by a web page, it may be desired, otherwise it may be not desired. These structure
rules will function as an input fuzzifier so we can make fuzzy reasoning
to derive the “desirability”, the possibility whether
a web page is desired and how much is the possibility, of a Web page. In this
way, all search requests
with similar (and usually fuzzy) structure characteristics can be grouped into
a WIST, which is implemented as an intelligent software agent using FL, NN,
GA, and other technologies to automatically find all relevant Web pages based
on the relevance inferred from structure rules and user-submitted keywords,
and rank them in a desired way. Essentially, a WIST agent uses a TSK-based
Fuzzy Neural Network (FNN) to infer the desirability. The
agent is intelligent because
it can 1 learn to get better parameters of FNN, 2 learn to get better structure
of FNN, and 3 learn to define structure characteristics by adding/modifying
structure rules.
This paper presents a new fuzzy neural network method for predicting protein secondary structures. Three protein secondary structures (i.e., alpha helix, beta sheet and coil) are used for protein classification. The traditional orthogonal encoding scheme on the other hand takes an inordinate amount of convergence time to train neural networks. To solve this problem, the new encoding scheme for the various amino acids is proposed to map 20 amino acid symbols to numerical values in [0, 1] based on relevance degrees among the 20 amino acids. For comparison, the conventional multi-layer neural network with the Back Propagation (BP) learning is also used to do protein secondary structure prediction. The training data sets and testing data sets are generated differently from public protein data in order to verify prediction performance objectively.
The hybrid neural network system consists of (1) three independent neural networks for predicting alpha helix, beta sheet and coil respectively, and (2) the final decision-making system that classify the current protein sequence into one class with the maximum prediction output of the three prediction outputs generated by the three independent neural networks.
The hybrid conventional neural network has three independent neural networks using BP learning. The accuracy for predicting alpha helix and beta sheet can reach 70% and 60%, respectively for a training data set of 200 and more. The prediction accuracy for coil can reach 77.7% for a training data set of 70 sequences. The new hybrid fuzzy neural network has three independent fuzzy neural networks using the new fuzzy learning. The initial simulation results have been generated by the hybrid fuzzy neural network. The accuracy for predicting alpha helix and beta sheet can reach 80.0% and 70.0%, respectively for a training data set of 200 and more. The prediction accuracy for coil can reach 88.8% for a training data set of 70 sequences.
In summary, the new hybrid fuzzy neural network is more effective than the hybrid conventional neural network in terms of prediction accuracy and learning speed. In the future, seven protein secondary structures will be used in prediction. The hybrid fuzzy neural network will be improved by adding new intelligent techniques such as genetic algorithms and kernel-based learning.
Informally, world knowledge is the knowledge acquired through experience, education and communication. World knowledge has a position of centrality in human cognition, and especially in summarization, assessment of relevance, deduction and search.
Centrality of world knowledge in human cognition entails its centrality in issues related to web intelligence. In the existing literature, world knowledge is dealt with through the use of knowledge-representation methods based on bivalent logic. The problem is that much of world knowledge, e.g., "California has a temperate climate" and " Most Finns are honest " is perception-based and/or dispositional. Such knowledge does not fit the conceptual structure of bivalent logical systems -- logical systems which are intolerant of imprecision, uncertainty and partial truth.
A thesis which is advanced in this talk is that to deal with world knowledge what is needed is fuzzy logic and, more particularly, computational theory of perceptions, dispositional logic and precisiated natural language. An outline of fuzzy-logic-based approach to world knowledge is presented and illustrated by examples.
Search engines, with Google at the top, have many remarkable capabilities. But what is not among them is the deduction capability - the capability to synthesize an answer to a query by drawing on bodies of information which are resident in various parts of the knowledge base. It is this capability that differentiates a question-answering system, Q/A system for short, from a search engine. Upgrading a search engine to a Q/A system is a complex, effort-intensive, open-ended problem. Semantic Web and related systems for upgrading quality of search may be viewed as steps in this direction. But what may be argued, as is done in the following, is that existing tools, based as they are on bivalent logic and probability theory, have intrinsic limitations. The principal obstacle is the nature of world knowledge.
The centrality of world knowledge in human cognition, and especially in reasoning and decision-making, has long been recognized in AI. The Cyc system of Douglas Lenat is a repository of world knowledge. The problem is that much of world knowledge consists of perceptions. More specifically, perceptions are f-granular in the sense that (a) the boundaries of perceived classes are fuzzy; and (b) the perceived values of attributes are granular, with a granule being a clump of values drawn together by indistinguishability, similarity, proximity or functionality. What is not widely recognized is that f-granularity of perceptions put them well beyond the reach of computational bivalent-logic-based theories.
Dealing with world knowledge needs new tools. A new tool which is suggested for this purpose is the fuzzy-logic-based method of computing with words and perceptions (CWP), with the understanding that perceptions are described in a natural language. A concept which plays a key role in CWP is that of Precisiated Natural Language (PNL). It is this language that is the centerpiece of our approach to reasoning and decision-making with world knowledge.
A concept which plays a key role in organization of world knowledge is that of an epistemic (knowledge-directed) lexicon (EL). Basically, an epistemic lexicon is a network of nodes and weighted links, with node i representing an object in the world knowledge database, and a weighted link from node i to node j representing the strength of association between i and j. The name of an object is a word or a composite word, e.g., car, passenger car or Ph.D. degree. An object is described by a relation or relations whose fields are attributes of the object. The values of an attribute may be granulated and associated with granulated probability and possibility distributions. For example, the values of a granular attribute may be labeled small, medium and large, and their probabilities may be described as low, high and low, respectively. Relations which are associated with an object serve as PNL-based descriptions of the world knowledge about the object. For example, a relation associated with an object labeled Ph.D. degree may contain attributes labeled Eligibility, Length.of.study, Granting.institution, etc. The knowledge associated with an object may be context-dependent. What should be stressed is that the concept of an epistemic lexicon is intended to be employed in representation of world knowledge - which is largely perception-based - rather than Web knowledge, which is not.
In conclusion, the main thrust of the fuzzy-logic-based approach to question-answering which is outlined in this abstract, is that to achieve significant question-answering capability it is necessary to develop methods of dealing with the reality that much of world knowledge is perception-based. Dealing with perception-based information is more complex and more effort-intensive than dealing with measurement-based information. In this instance, as in many others, complexity is the price that has to be paid to achieve superior performance.
A key component of any autonomous system is a decision
module, which is capable of handling large volume of data at high speed
and high reliability. This paper will be focused,
in the main, on the development of a decision module, which is capable of
functioning in an environment of imprecision, uncertainty and imperfect reliability. There will be two principal tasks: Task A, which
will be aimed at the development of novel methods of analysis and design:
and Task B, which will be focused on the development of a decision-support
system for ranking of decision alternatives. Following
is an outline of the principal issues, which will be addressed in Task A.
In this study, we introduced the BISC decision support
system as an alternative for ranking and predicting the risk which utilizes
an imprecise and subjective process. The BISC
decision support system key features are 1) intelligent tools to assist decision-makers
in assessing the consequences of decision made in an environment of imprecision,
uncertainty, and partial truth and providing a systematic risk analysis, 2)
Acknowledgements
Funding
for this research was provided by the British Telecommunication (BT) and
the BISC Program of UC Berkeley.
References
M. Nikravesh (2001) Credit Scoring for Billions of Financing Decisions, Joint 9th IFSA World Congress and 20th NAFIPS International Conference. IFSA/NAFIPS 2001 "Fuzziness and Soft Computing in the New Millenium", July 25-28, 2001
M. Nikravesh and B. Azvine, “Fuzzy queries, search, and decision support system”, Soft Computing, vol. 6, pp. 373–399, 2002.
M. Nikravesh, B. Azvine, R. Yager and Lotfi A. Zadeh (2003b) "New Directions in Enhancing the power of the Internet", Editors:, to be published in the Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer (August 2003)
M. Nikraves and Lotfi A. Zadeh, "Fuzzy Logic an the Internet", to be published in the Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer (August 2003)
M. Nikravesh, B. Azvine, R. Yagar and Lotfi A. Zadeh, "New Directions in Enhancing the power of the Internet", to be published in the Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer (Dec 2003)
Zadeh L. A. (1999), From Computing with Numbers to Computing with Words -- From Manipulation of Measurements to Manipulation of Perceptions, IEEE Transactions on Circuits and Systems, 45, 105-119, 1999.
Zadeh, L. and Kacprzyk, J. (eds.) (1999a), Computing With Words in Information/Intelligent Systems 1: Foundations, Physica-Verlag (1999a).
Zadeh, L. and Kacprzyk, J. (eds.) (1999b), Computing With Words in Information/Intelligent Systems 2: Applications, Physica-Verlag,(1999b).
Zadeh, L.A. (1996) Fuzzy Logic = Computing with Words,” IEEE Trans. on Fuzzy Systems (1996) 4, 103-111.
Zadeh, L. A. and M. Nikravesh (2002), Perception-Based Intelligent Decision Systems, AINS; ONR Summer 2002 Program Review, 30 July-1 August, UCLA.
In the world of information processing, we are faced with increasingly complex multi-domain problems containing either real-world or computer-generated data. To consider these problems the classical data processing tools may not be sufficient and more advanced approaches need to be developed. A unified approach based on Soft Computing will help in solving such problems by combining methodologies (Fuzzy Logic, Neuro-Computing, Evolutionary Computing and Probabilistic Reasoning), which collectively provide a foundation for the conception of Intelligent Systems.
Our aim is to develop intelligent computing techniques that address the problem of multi-criteria decision making dealing with subjective and imprecise data. This kind of problem requires conception of intelligent systems able to replace a human with expertise in a specific domain in a decision making process. So, the intelligent system should take into account the subjective and imprecise character of data on the one hand, and represent the user or expert’s preferences and knowledge on the other hand. For this purpose, we developed a generic multi-criteria model based on fuzzy logic concepts for decision support systems. Our goal is to build such a model by 1) fitting real-world data and 2) representing the preferences of specific-domain users or experts. Toward this end, we used Evolutionary Computation techniques. Initially, we worked on a first order aggregation model and performed its learning using Genetic Algorithms, in which these preferences have been represented by a weighting vector associated with the variables involved in the aggregation process. This has been used in a specific application related to university admissions. Then, we developed a more advanced multi-aggregation model based on a hierarchical decision trees and for the learning process of this model, we developed a technique inspired from Genetic Programming. In this model tree nodes represent aggregators, terminals or leaves correspond to variables, and weight values are added to the children branches for each aggregator. The aggregation result overall the variables is then obtained by running recursively the root aggregator of the tree.
The parameters characterizing this multi-aggregation model are aggregators, weights and their combination in form of a tree structure. In this case, the learning process has to find the optimal combination of these parameters based on training data. In this learning process, the evolution principle remains the same as in a conventional GP but the DNA encoding needs to be defined according to the considered problem. We need to define a more complex tree structure representing the multi-aggregation model. In addition to the weights that have to be added to the classical tree, the nodes that represent the aggregators require a variable number or arguments. This is because the number of arguments cannot be known before the tree creation. Therefore, during the evolution process, trees are generated randomly by selecting aggregators for the nodes, and at the same time, the corresponding numbers of arguments are randomly chosen. Moreover, weight values are fixed randomly in each branch of the tree during its creation.
These encoding properties allow a large search space to solve our problem. If we need to simplify this tree structure according to some application constraints, we can add these constraints to the problem specification and they will be checked during the tree generation.
We pursue this work by considering many other
applications and we aim at carrying out the multi-aggregation model base on
decision trees. We defined this model as a basis
for a more general and a more complex form of the considered problems which
operates with linguistic variables. Our approach is a first attempt toward
the use of the Computing with Words and Perceptions.
Most of the existing search systems ‘software’ are modeled using crisp logic and queries. We introduce fuzzy querying and ranking as a flexible tool allowing approximation where the selected objects do not need to match exactly the decision criteria resembling natural human behavior. The model consists of five major modules: the Fuzzy Search Engine, the Application Templates, the User Interface, the Database and the Evolutionary Computing. The system is designed in a generic form to accommodate more diverse applications and to be delivered as stand-alone software to academia and businesses.
In this study, we introduced fuzzy query, fuzzy aggregation, evolutionary computing and the BISC decision support system as an alternative for ranking and predicting the risk for credit scoring, university admissions, and several other applications, which currently utilize an imprecise and subjective process. The BISC decision support system key features are 1) intelligent tools to assist decision-makers in assessing the consequences of decision made in an environment of imprecision, uncertainty, and partial truth and providing a systematic risk analysis, 2)intelligent tools to be used to assist decision-makers answer “what if”, questions examine numerous alternatives very quickly and find the value of the inputs to achieve a desired level of output, and 3) intelligent tools to be used with human interaction and feedback to achieve a capability to learn and adapt through time In addition, the following important points have been found in this study: 1) no single ranking function works well for all contexts, 2) most similarity measures work about the same regardless of the model, 3) there is little overlap between successful ranking functions, and 4) the same model can be used for other applications such as the design of a more intelligent search engine which includes the user's preferences and profile (Nikravesh, 2001a and 2001b).
Acknowledgment
Funding for this research was provided by the British Telecommunication (BT) and the BISC Program of UC Berkeley.
References
Fagin R. (1998) Fuzzy Queries in Multimedia Database Systems, Proc. ACM Symposium on Principles of Database Systems, pp. 1-10.
Fagin R. (1999) Combining fuzzy information from multiple systems. J. Computer and System Sciences 58, pp 83-99.
J. R. Koza, Genetic Programming : On the Programming of Computers by Means of Natural Selection, Cambridge : MIT Press 1992, 819 pages.
John H. Holland. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, 1992. First Published by University of Michigan Press 1975.
Mizumoto M. (1989) Pictorial Representations of Fuzzy Connectives, Part I: Cases of T-norms, T-conorms and Averaging Operators, Fuzzy Sets and Systems 31, pp. 217-242.
Nikravesh M. (2001a) Perception-based information processing and re-trieval: application to user profiling, 2001 research summary, EECS, ERL, University of California, Berkeley, BT-BISC Project. (http://zadeh.cs.berkeley.edu/ & http://www.cs.berkeley.edu/~nikrav
es/ & http://www-bisc.cs.berkeley.edu/).Nikravesh M. (2001b) Credit Scoring for Billions of Financing Deci-sions, Joint 9th IFSA World Congress and 20th NAFIPS International Conference. IFSA/NAFIPS 2001 "Fuzziness and Soft Computing in the New Millenium", Vancouver, Canada, July 25-28, 2001.
Masoud Nikravesh and Ben Azvine (2002), Fuzzy Queries, Search, and Decision Support System, Journal of Soft Computing, Volum 6, # 5, August 2002.
Masoud Nikravesh, B. Azvine, R. Yagar, and Lotfi A. Zadeh (2003) "New Directions in Enhancing the power of the Internet", Editors:, to be published in the Series Studies in Fuzziness and Soft Computing, Physica-Verlag, Springer (August 2003).
Since a fuzzy set is defined by enumerating its elements and the degree of membership of each element, we can use it to express word ambiguity by enumerating all possible meanings of a word, then estimating the degrees of compatibilities between the word and the meanings.
Based on this approach, we have proposed using conceptual fuzzy sets (CFSs) to represent the various meanings of a concept that change dynamically depending on the context. A CFS (is realized as neural networks in which a node represents a concept and a link represents the strength of the relation between two (connected) concepts. The activation values agreeing with the grades of membership are determined through this associative memory. In a CFS, the meaning of a concept is represented by the distribution of the activation values of the other nodes. The distribution evolves from the activation of the node representing the concept of interest.
This presentation will start with my motivation to propose CFSs and algorithm to generate CFSs. It will describe how it works to represent the context dependent meaning of a word and to measure a conceptual distance between documents. Next, information filtering and image search (Google-Based Search Engine for Multimedia Data) will be introduced as its applications to information retrieval using capability of conceptual matching. Finally we will introduce our approach to enhancing CFSs based on brain architecture.
In this presentation, first we will present the role of the fuzzy logic in the Internet. Then we will present an intelligent model that can mine the Internet to conceptually match and rank homepages based on predefined linguistic formulations and rules defined by experts or based on a set of known homepages. The FCM model will be used for intelligent information and knowledge retrieval through conceptual matching of both text and images (here defined as "Concept"). The FCM can also be used for constructing fuzzy ontology or terms related to the context of the query and search to resolve the ambiguity. This model can be used to calculate conceptually the degree of match to the object or query. We will also present the integration of our technology into commercial search engines such as Google ™ and Yahoo! as a framework that can be used to integrate our model into any other commercial search engines, or development of the next generation of search engines.
The remarkable growth of the World Wide Web (WWW) since its origin in the 1990's calls for efficient and effective tools for information retrieval. Attempting to deal with the overwhelming amount of information provided on billions of webpages nowadays does not necessarily imply that we have to develop entirely new technologies from scratch. In the 1970's and 1980's initial research was performed on the retrieval of information from modest text collections, using fuzzy relations to represent dependencies between terms on one hand, and between terms and documents on the other. Since then the fuzzy mathematical machinery (i.e. fuzzy logical operators, fuzzy similarity measures, operations with fuzzy relations) has come of age. We revisit the “old" information retrieval strategies and brush them up with new insights to prepare them for a great challenge: search in the biggest collection of text known by mankind.
Decades of research in the field of Social Ne