From: Olivier Matz Date: Thu, 9 Jan 2014 19:28:28 +0000 (+0100) Subject: update with comments and new figures X-Git-Url: http://git.droids-corp.org/?a=commitdiff_plain;h=9e23a555ad9b9b732be5819f9c367c09671a0238;p=slides-virt.git update with comments and new figures --- diff --git a/_static/custom.css b/_static/custom.css index 7443f7b..f69da1b 100644 --- a/_static/custom.css +++ b/_static/custom.css @@ -34,7 +34,14 @@ img { /* bold is in red */ strong { - color: #B00000; + color: #0000FF; +} + +/* italic is in bluc */ +em { + color: #C00000; + font-style: normal; + font-weight: bold; } /* hacks for table of contents */ diff --git a/cpu-virt.svg b/cpu-virt.svg index e3922cf..3404426 100644 --- a/cpu-virt.svg +++ b/cpu-virt.svg @@ -38,9 +38,9 @@ fit-margin-left="20" fit-margin-right="20" fit-margin-bottom="20" - inkscape:zoom="0.95353535" - inkscape:cx="266.81499" - inkscape:cy="123.42837" + inkscape:zoom="1.3485026" + inkscape:cx="231.56944" + inkscape:cy="178.50521" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -127,24 +127,7 @@ style="fill:none;stroke:#3465af;stroke-width:28.22200012" inkscape:connector-curvature="0" id="path13877" - d="m 6769.9664,3357.4444 -2285,0 0,-2741.99996 4570,0 0,2741.99996 -2285,0 z" />Applications -Guest OS -Applications -Guest OS -Applications -Guest OS -VM0 -VM1 -VM2 -Virtual Machine Monitor (VMM) -Ring 0 -Ring 3 + xml:space="preserve" + style="font-size:395.1111145px;font-style:normal;font-weight:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans" + x="6778.1655" + y="2095.4473" + id="text3253" + sodipodi:linespacing="125%">Applications +Guest OS +VM0 +VM1 +VM2 +Virtual Machine Monitor (VMM) +Ring 3 +Ring 0 +Applications +Applications +Guest OS +Guest OS \ No newline at end of file diff --git a/high-thput.svg b/high-thput.svg new file mode 100644 index 0000000..ca75c6f --- /dev/null +++ b/high-thput.svg @@ -0,0 +1,718 @@ + + + + + + + + image/svg+xml + + + + + + + + + + + + + + diff --git a/high-thput1.jpg b/high-thput1.jpg index 64fa83d..81d2363 100644 Binary files a/high-thput1.jpg and b/high-thput1.jpg differ diff --git a/hosted.svg b/hosted.svg index 373e88a..8fd2ae2 100644 --- a/hosted.svg +++ b/hosted.svg @@ -38,9 +38,9 @@ fit-margin-left="20" fit-margin-bottom="20" fit-margin-right="20" - inkscape:zoom="0.88080808" - inkscape:cx="358.56693" - inkscape:cy="47.641728" + inkscape:zoom="1.2456507" + inkscape:cx="349.41136" + inkscape:cy="267.80509" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -111,278 +111,47 @@ d="m 10403.555,12918.555 -5712.9996,0 0,-1828 11424.9996,0 0,1828 -5712,0 z" id="path3042" inkscape:connector-curvature="0" - style="fill:#c0c0c0;stroke:none" />Hardware -Native OS -VMM -Guest OS -Applications -VMM -Guest OS -Applications -VMM -Guest OS -Applications - \ No newline at end of file + style="fill:none;stroke:#3465af;stroke-width:152" />Applications +Applications +Applications +Guest OS +Guest OS +Guest OS +Native OS +VMM +VMM +VMM +Hardware + \ No newline at end of file diff --git a/ibm-as400.jpg b/ibm-as400.jpg new file mode 100644 index 0000000..5a0e974 Binary files /dev/null and b/ibm-as400.jpg differ diff --git a/ibm370.jpg b/ibm370.jpg new file mode 100644 index 0000000..92a3b6a Binary files /dev/null and b/ibm370.jpg differ diff --git a/index.rst b/index.rst index 1ab337b..30bc025 100644 --- a/index.rst +++ b/index.rst @@ -6,7 +6,7 @@ System Virtualization and OS Virtual Machines ============================================= -:Date: 2013-10-29 +:Date: 2013-12-19 :Authors: Ivan Boule, Olivier Matz Plan @@ -15,30 +15,120 @@ Plan Contents -------- -.. contents:: - :depth: 2 - :backlinks: none +- History of Virtualization +- Virtualization Usage and Taxonomy +- Process Level Virtualization -History -======= + - ABI Emulation + - Virtual Servers + +- System Level Virtualization + + - Transparent Hardware Emulation + - Transparent Hardware Virtualization + - Paravirtualization + - Hardware-Assisted Virtualization + +- Conclusion + +Who am I? +--------- + +- Olivier MATZ ```` +- Software engineer since 10 years at 6WIND +- 6WIND is a software company designing high performance network + software + + - http://www.6wind.com + +- I'm mainly developing low-level code: Linux kernel, drivers and + applications close to the operating system History of Virtual Machines ---------------------------- +=========================== + +Sixties: introduction of IBM/370 series +--------------------------------------- + +- Generalization of virtual memory +- Microprogramation of instructions on small models +- CP/CMS hypervisor + +.. figure:: ibm370.jpg + :width: 60% + +.. note:: + + - IBM/370: généralisation de la mémoire virtuelle + - IBM/370: microprogrammation de certaines instructions sur les + petits modeles + - IBM/370: hyperviseur CP/CMS (Control Program/Conversational + Monitoring System), gérant des machines virtuelles sous lequel on + pouvait faire tourner indifféremment des CMS, des DOS et des + OS. Proposé à des clients le temps d’effectuer des migrations des + DOS vers OS, il sera souvent conservé pour la très grande + convivialité de CMS utilisé comme système de temps partagé. + + le produit VM/370, créé par IBM dans les années 1970, permettait à + plusieurs usagers d'utiliser en temps partagé un ordinateur qui + exécute le système d'exploitation IBM DOS. IBM DOS tout seul + n'offrait pas la possibilité d'utilisation en temps partagé2. + + - temps partagé entre VM + +Eighties: IBM AS/400 +-------------------- + +- Many logical machines in one physical machine +- High level (virtual) ISA including I/Os (TIMI) + + - Take advantage of advances in hardware without recompilation + - User binaries contain both TIMI instructions and machine instructions + - Easier transition to PowerPC + +.. figure:: ibm-as400.jpg + :width: 40% + +.. note:: + + - IBM/AS-400: c'est un mini-ordinateur de la gamme IBM, fin des années 1980 + - IBM/AS-400: possibilité de "découper" plusieurs machines logiques + dans une machine physique. + - IBM/AS-400: un programme ne parle pas directement au matériel, il + utilise un set d'instructions haut-niveau (ISA), ce qui rend le + programme indépendant du CPU sur lequel il tourne. Ceci a facilité + la transition vers les PowerPC. + - http://en.wikipedia.org/wiki/IBM_System_i + - XXX IBM/AS-400: pourquoi "co-designed VM" ? XXX rechercher sur internet + - emulation des instructions CPU de "haut niveau" + XXX regarder comment ca marche: est-ce que c'est un hyperviseur ou + un interpreteur. + +Nineties and later: application VMs +----------------------------------- + +.. figure:: java.png + :width: 15% + +- Java -- VM introduced in the sixties on IBM/370 series + - a Java program is compiled into a portable bytecode + - the JVM is a fictive computer that is able to run this bytecode -- Co-Designed VM: IBM AS/400 +- Microsoft Common Language Infrastructure (.Net) - - High level ISA including I/Os - - Proprietary CISC → PowerPC +.. note:: -- Application VMs + - http://en.wikipedia.org/wiki/Java_virtual_machine - - Sun Java, Microsoft Common Language Infrastructure +Now: OS virtual machines +------------------------ -- OS VMs +- Run an operating system virtualized top of a virtual machine +- Examples: - - VMware (virtualized PC on x86) + - VMware products (virtualized PC on x86) + - KVM - Virtual PC (PC emulation on Mac OS/PowerPC) - Many others : Bochs, VirtualBox, Qemu, ... @@ -65,6 +155,8 @@ Goals of System Virtualization - Reduction of Total Cost of Ownership (TCO) - Increase utilisation of server resources + - Spawn new servers "on demand" (ex: Amazon EC2 and Elastic Load + Balancer) - Reduction of Total Cost of Functioning @@ -72,38 +164,55 @@ Goals of System Virtualization - Cooling - Occupied Space -- Hardware Consolidation - -- Reduction of Build Of Material (BOM) for high-volume low-end - products - -- Isolation of OS for security purposes - +- Isolation of OS for security purposes (Qubes, Cells) + +.. note:: + + - reduction TCO + TCF: parler du cas data center. On peut parler DE + migration à chaud, d'élasticité, ... + - amazon ec2: + + - un client peut créer des machines virtuelles à la demande + - Elastic Load Balancer: Les ELB permettent de répartir la charge + entre les instances EC2 + - Autoscaling: Permet de gérér automatiquement l'élasticité sur + un ou plusieurs groupes d'instances EC2 + - Cloud Watch: Permet de suivre et monitorer des métriques des + instances EC2 pour envoyer des notifications ou prendre des + actions + + - "qubes" (security) http://qubes-os.org/trac/wiki/QubesScreenshots + + - Based on a secure bare-metal hypervisor (Xen) + - Networking code sand-boxed in an unprivileged VM (using IOMMU/VT-d) + - USB stacks and drivers sand-boxed in an unprivileged VM (currently + experimental feature) + - No networking code in the privileged domain (dom0) + - All user applications run in “AppVMs”, lightweight VMs based on + Linux + - Centralized updates of all AppVMs based on the same template + - Qubes GUI virtualization presents applications like if they were + running locally + - Qubes GUI provides isolation between apps sharing the same desktop + - Secure system boot based (optional) Virtualization in high-throughput network equipments ---------------------------------------------------- -.. figure:: high-thput1.jpg - -.. figure:: high-thput2.jpg - -Virtualization in Multimedia devices ------------------------------------- - -- Reduction of Build Of Material (BOM) for high-volume low-end - products +.. figure:: high-thput.svg + :width: 100% - - No need for a general purpose processor +.. note:: - - 20 to 25 % BOM reduction + - Initialement, on a un système qui tourne sur plusieurs anciennes + cartes (plus la carte de management sous linux). On veut mettre à + jour le matériel, il est alors possible si la nouvelle carte est + plus puissante de virtualiser les anciennes sans modifier le + logiciel. - - Run Linux together with OS supporting Codecs on a single TI DSP + dataplane + control plane -> en une carte - - Leverage Linux environment - - - Reuse existing DSP software - -XXX 2 images + - Reprendre ce qui a été dit au slide précédent Usages of Virtual Machines -------------------------- @@ -112,11 +221,6 @@ Usages of Virtual Machines - Web sites hosting -- OS partitionning - - - Time sharing - - Security - - OS/kernel education & training - OS fault recovery @@ -127,11 +231,51 @@ Usages of Virtual Machines - Run applications not supported by host OS +- OS migration without reinstalling it on a new hardware + +.. note:: + + - time sharing: on veut utiliser plusieurs OS sur la meme machine: + analogie avec plusieurs processes. + + - eduction & training: on peut imaginer le cas d'un TP, comme présenté + dans l'article linux mag 140 sur la libvirt: chaque étudiant + travaille sur une machine virtuelle préconfigurée XXX a relire + + - backward compatibility: préciser que c'est utile lorsque le matériel + n'est plus disponible par exemple. + + - run app not supported by host OS: wine + + - Certains services ne sont accessibles qu'au niveau de l'OS + (routage, filtrage, ...). Avoir plusieurs OS permet de les + dupliquer (ex: daisy chain tcp avec des VR) + Recovery Servers ---------------- +- Another example: one backup server to replace any machine + .. figure:: recovery.png + :width: 100% + +.. note:: + + - La virtualisation permet de faire de la haute disponibilité à pas + cher. Souvent c'est le logiciel qui crashe. On peut dupliquer tout + une architecture reseau: + - apache + - mySQL + - mail + - etc... + - Un seul serveur backup à droite pour tous les autres + serveurs. Permet de ne pas avoir 8 machines. Si un des 4 se casse la + gueule, c'est celui de droite qui prend la main. + + - en effet, chaque machine a sa propre configuration + systeme/reseau/filtrage... Il n'est pas forcément évident de + mettre les 4 services sur une même machine sans virtualisation. Multi-Core CPU Issues (1) ------------------------- @@ -150,6 +294,26 @@ Multi-Core CPU Issues (1) - Adaptation to multi-pro even more difficult than RTOS +.. note:: + + - cas des applications multi-threadées mais conçues avec en tête le + fait que la machine n'a qu'un seul core. la virtualisation systeme + permet de paralleliser ces applis sur des machine physiques + multicores (chaque VM étant mono-core), expliqué slide suivant. + + - Beaucoup d'applications sont encore monoprocesseur. Cela simplifie + drastiquement la manière de coder, il n'y a pas de race condition, + pas besoin de locks/mutex. XXX + + - ce probleme se pose moins sur un système classique que sur des + systèmes anciens ou des systèmes temps réel. En effet, les systèmes + classiques modernes supportent très bien le multicore et il + suffirait de lancer plusieurs applications simultanément. XXX + + - certaines applications RT multithreadées comptent sur le fait qu'il + n'y a qu'un CPU, et que 2 threads ne sont jamais executés de manière + réellement concurrente + Multi-Core CPU Issues (2) ------------------------- @@ -164,13 +328,28 @@ Multi-Core CPU Issues (2) - Scalability managed at virtualization level +.. note:: + + - La virtualisation système permet de faire tourner plusieurs instance + d'un système d'exploitation non SMP sur un processeur multicore. + + - Cela peut permettre d'éviter de réécrire un logiciel conçu pour une + machine mono-core. Le logiciel dont il est question ici est plutôt + un logiciel RT ou un noyau, car si c'est une application standard, + le problème ne se pose pas. + Virtualization Taxonomy ======================= +.. note:: + + taxonomy = inventaire + Machines Interfaces ------------------- .. figure:: isa-abi.svg + :width: 70% - ISA = Instruction Set Architecture @@ -180,21 +359,60 @@ Machines Interfaces - ABI = Application Binary Interface - Process level interface - - User-level non privileged ISA instructions + OS systems 14 calls + - User-level non privileged ISA instructions + OS systems calls + +.. note:: + + - ISA: Instruction Set Architecture + + les instructions du CPU (donner des exemples, comme le MOV, CLI/STI + pour vérouiller les interruptions), les périphériques, la MMU + (comment elle est doit être configurée), ... + + C'est l'interface qui est utilisé par le système d'exploitation. + + - ABI: Application Binary Interface + + C'est l'interface qui permet à un processus de communiquer avec + l'extérieur. Il s'agit principalement d'appels systèmes (read, + write, gettimeofday, execve, sleep). + + l'abi contient les instructions non-privilegiées + l'api de l'OS. + D'autres instructions comme le cli/sti ne font pas partie de l'ISA. + + - exemple de la couche de compatibilité pour une application 32 bits + tournant sur un kernel 64 bits. Virtualization Taxonomy ----------------------- -- Process level virtualization +- Virtualization at process level (ABI) - Emulation of Operating System ABI - - Emulation of OS ABI, cross-architecture - Virtual Servers -- System level virtualization +- Virtualization at system level (ISA) -- Standalone / Hosted Virtualization -- Machine Emulation / Machine Virtualization + - Standalone vs Hosted Virtualization + - Machine Emulation vs Machine Virtualization + +.. note:: + + - un processus tourne déjà dans une machine virtuelle fournie par + l'OS, mais pas au même niveau. Historiquement, l'objectif d'un + système d'exploitation multitâche est de fournir des machines + virtuelles pour les applications (donc les utilisateurs). Chaque + application "pense" qu'elle est tout seule sur le processeur. + + Chaque application peut avoir accès aux ressources via les appels + systèmes, comme si l'application était la seule à parler aux + périphériques. C'est au système d'exploitation d'ordonnancer les + processus et leurs requetes. + + - la virtualisation systeme fonctionne sur le même principe mais à un + niveau différent. Nous allons voir dans les slides suivants les + différents types de virtualisation (standalone vs hosted, et + emulation vs virtualisation). Hosted versus Standalone Virtualization --------------------------------------- @@ -211,15 +429,32 @@ Hosted versus Standalone Virtualization - OS run in a VM is named a Guest OS +.. note:: + + - hosted = hebergée + + - guest = invité + + - standalone = autonome, plus petit + + - en général, le "hosted" n'accede pas réellement au hardware mais à des + périphériques émulés + + - le cas kvm est ambigu: le kernel qui tourne en mode root + s'execute réellement sur le bare-hardware. + Hosted Virtualization --------------------- .. figure:: hosted.svg + :width: 100% Example: VMware Workstation ---------------------------- -.. figure:: vmware-wks.png +.. figure:: vmware-wks.svg + :width: 100% + :class: fill - Hosted VM - Unmodified OSes @@ -231,10 +466,15 @@ Standalone Virtualization ------------------------- .. figure:: standalone.svg + :width: 100% Example: VMware ESX ------------------- +.. figure:: vmware-esx.svg + :width: 100% + :class: fill + - Standalone VMM - Supports unmodified OS binaries @@ -245,31 +485,39 @@ Example: VMware ESX - Guest OS - runs in user mode -Process Level Virtualization -============================ +Process Level Virtualization: ABI Emulation +=========================================== Process level ABI Emulation --------------------------- - Goal: execute binary applications of a given system **X** on the ABI of - another system **Y** + another system *Y* -- Emulate system **X** ABI on top of system **Y** ABI +- Emulate system **X** ABI on top of system *Y* ABI - Emulation done by application-level code -- System **Y** must provide services equivalent to those of system +- System *Y* must provide services equivalent to those of system **X** (file system, sockets, etc...) +- Example: **X** = Windows and *Y* = Linux + +.. note:: + + - exemple du CreateFile() de windows qui serait émulé par un open() + sur un unix + Process Level (ABI) Emulators ----------------------------- -- Wine - Windows Emulator on Unix/Linux +- Wine run Windows applications on POSIX-compliant operating + systems - Windows API in userland - Adobe Photoshop, Google Picasa, ... -- Cygwin +- Cygwin: recompile POSIX applications so they can run under Windows - Unix emulation on Windows - POSIX library @@ -277,6 +525,19 @@ Process Level (ABI) Emulators - GNU development tool chain (gcc, gdb) - X Window, GNOME, Apache, sshd, ... +.. note:: + + - **DEMO**: lancer un .exe avec wine64 + - l'ABI dépend du système d'exploitation mais aussi de l'architecture. + + - les appels systèmes sont différents entre linux et windows + - mais les appels systemes ne s'invoquent pas de la même manière + sur 2 architectures différentes. Par exemple, sur un x86, on + utilise un INT 0x80 (en fait SYSENTER maintenant), et les + arguments sont placés dans des registres particuliers + + - google picasa for linux inclut une version embarquée de wine + Process Level Cross-architecture Emulators ------------------------------------------ @@ -285,38 +546,72 @@ Process Level Cross-architecture Emulators - Emulated OS and native OS are the same (ex: both are linux) - Emulated arch is different than native architecture (ex: x86 and powerpc) + - Note: we define what is "emulation" later in the presentation + +- Example: qemu-user -- Example: qemu-user:: +.. code-block:: sh $ gcc hello.c $ ./a.out hello - $ powerpc-linux-gnu-gcc -static hello.c $ ./a.out bash: ./a.out: cannot execute binary file $ qemu-ppc ./a.out hello +.. note:: + + - par exemple, vous récupérer une freebox ou un routeur basé sur du + mips ou arm, et vous voulez lancer et débugger une application. + +Process Level Virtualization: virtual servers +============================================= + Virtual Servers (1) ------------------- - Single OS kernel / Multiple resource instances + - can run several linux distributions on the same kernel + - Isolated kernel execution environments - Root file system - Network: Routing table, IP tables, interfaces... - - Process for signals + - Process signals - Solaris 10 Containers -- LXC, Linux-VServer, openVZ +- LXC, Linux-VServer, openVZ: namespaces and cgroups - FreeBSD Jail +.. note:: + + - tous les processus sont vus par le kernel + + - les processus ont des vues différentes du système d'exploitation et + sont cloisonnés. Ils n'ont pas conscience des domaines adjacents et + ont des vues différentes du système (FS, réseau, ...). + + - Les namespaces de linux sont un bon exemple (lxc, openVZ). + + - XXX reflechir à une demo... ? + + - expliquer comment ça peut être implémenté dans le kernel: un + parametre supplémentaire pour chaque appel systeme + + - dire que niveau sécurité, c'est pas encore ça pour cloisonner. + + - voir dessin slide suivant + + - signal -> table of process ? + Virtual Servers (2) ------------------- .. figure:: virtual-servers.svg + :width: 100% Virtual Servers (3) ------------------- @@ -333,11 +628,11 @@ Virtual Servers (3) - Con's - - No OS heterogeneity (no GPOS/RTOS combination) + - No OS heterogeneity - Single OS binary instance (common point of failure) -Transparent Hardware Emulation -============================== +System Level Virtualization: Transparent Hardware Emulation +=========================================================== Transparent Hardware Emulation (1) ---------------------------------- @@ -357,35 +652,56 @@ Transparent Hardware Emulation (1) Transparent Hardware Emulation (2) ---------------------------------- -- Emulate machine X on top of machine Y +- Emulate machine **X** on top of machine *Y* -- Interpretation +- Interpretation: read, decode, execute - - 1 instruction of X executed by N instructions of Y + - 1 instruction of **X** executed by N instructions of *Y* - Huge slow down method - Dynamic Binary Translation - - Convert blocs of X instructions in Y instructions - -- Application-level emulator runs on a native OS -- One VM running a single Guest OS - -QEMU Architecture ------------------ - -.. figure:: qemu.svg + - Convert blocs of **X** instructions in *Y* instructions + - Conversion is done once per basic block + - Advanced: dynamic optimization of 'hot' blocs + +- The emulator is usually a standard application running on a native + OS + +.. note:: + + - Expliquer comment un emulateur peut être implémenté, c'est un gros + switch/case, chaque instruction doit être parsée et son comportement + doit être émulé. L'émulateur doit conserver dans des variables + l'état des registres. + + - Voilà pourquoi on en arrive à faire de la translation de blocs de + code. Attention, la translation dynamique ne se fait qu'à la volée, + c'est plus difficile de prendre le binaire, le convertir, et l'executer + (translation statique). + + - https://en.wikipedia.org/wiki/Binary_translation + - Dynamic binary translation looks at a short sequence of + code—typically on the order of a single basic block—then + translates it and caches the resulting sequence. + - Code is only translated as it is discovered and when possible, and + branch instructions are made to point to already translated and + saved code (memoization). + - Apple Computer implemented a dynamic translating emulator for M68K + code in their PowerPC line of Macintoshes, which achieved a very + high level of reliability, performance and compatibility + - Intel: IA32 over Itanium QEMU: Hosted Hardware Emulator ------------------------------ - Cross ISA Emulation - - Emulate machine X on top of machine Y + - Emulate machine **X** on top of machine *Y* - Interpretation + translation -- Intel x86, PowerPC, ARM, Sparc architectures +- Intel x86, PowerPC, ARM, Mips, Sparc, ... - Emulation of SMP architectures @@ -394,8 +710,35 @@ QEMU: Hosted Hardware Emulator - Hard Disk drives, CD-ROM, network controllers, USB controllers, ... - Synchronous emulation of device I/O operations -Transparent Hardware Virtualization -=================================== +.. note:: + + - **DEMO**: lancer kid icarus avec mednafen + - ``mednafen -vdriver sdl -nes.xscale 4 -nes.yscale 4 ~/cours_ivan/cours_virt/Kid\ Icarus\ \(Europe\)\ \(Rev\ A\).zip`` + - voir /usr/share/doc/mednafen/mednafen.html + - http://idoc64.free.fr/ASM/instruction.htm + - QSDZ = dir, ret=start, tab=select, OP=buttons + - Alt-D affiche le debugger + - addresse A6 diminue qd on perd des vies + - shift-W: write breakpoint, R pour run + - on peut essayer de mettre une grosse valeur: + Poke A6 30 1 + - ne marche pas, car sature + - breakpoint à A6 + - shift P (poke in rom): ED45 60 1 (on met un RTS) + c'est l'endroit qui sature + - Poke A6 30 1 + - à l'adresse DB6C, c'est l'endroit où on stocke A6 après s'etre fait toucher + par un monstre:: + + LDA A6: charge la valeur + SEC: set carry + SBC: sub with carry + BCS: branch on carry set (on comprend que si ça vaut < 0, on met 0) + + - 7E42 0 1 -> on met 0 sur le decrement des monstres + +System Level Virtualization: Transparent Hardware Virtualization +================================================================ Transparent Hardware Virtualization ----------------------------------- @@ -413,23 +756,38 @@ Transparent Hardware Virtualization - Share machine resources among multiple VMs +.. note:: + + - le slide décrit la problematique qui est la meme que pour l'émulation + + - peut etre donner aussi les exemples style kqemu ou virtualbox + (modules accélération). Dire aussi que ça ne concerne toujours pas + les Intel-VT, dire que ça va plus vite que l'émulation + + - share machine resource: exemple des pages memoires en copy-on-write + Full CPU Virtualization (1) --------------------------- - Present same functional CPU to all Guest OSes -- VMM manages a CPU context for each VM +- VMM manages a CPU context for each vCPU of each VM - saved copy of CPU registers - representation of software-emulated CPU context -- VMM shares physical CPUs among all VMs +- VMM shares physical CPUs among all vCPU of VMs - VMM includes a VM scheduler - round-robin - priority-based +.. note:: + + - representation of software-emulated CPU context: exemple, savoir que + les IT sont masquées ou non. + Full CPU Virtualization (2) --------------------------- @@ -450,6 +808,7 @@ CPU Virtualization - Run each Guest OS in non-privileged mode .. figure:: cpu-virt.svg + :width: 100% "Hardware-Sensitive" Instructions --------------------------------- @@ -466,6 +825,12 @@ CPU Virtualization - Done once, saved in Translation Cache - Example: Vmware +.. note:: + + - instructions priviligées: ex, masquage des IT + + - intruction critiques: ex, read de status flag, de CR3, ... + Privileged Instructions Virtualization -------------------------------------- @@ -506,6 +871,13 @@ Critical Instructions Virtualization (1) - But no exception for popf => VMM not aware of Guest OS action (unmask interrupts) +.. note:: + + - premier pb: pushf est autorisé et met toujours en pile des flags + disant que les IT sont autorisées + - popf doit aussi être intercepté car il faut mettre à jour le + statut des IT + Critical Instructions Virtualization (2) ---------------------------------------- @@ -517,6 +889,12 @@ Critical Instructions Virtualization (2) - VMM emulates expected effect of critical instruction, if any. +.. note:: + + - **PAUSE** + - XXX est-ce que la translation doit être faite uniquement sur le + code qui a vocation à tourner en ring 0 ? + Full Memory Virtualization -------------------------- @@ -532,22 +910,62 @@ Full Memory Virtualization - 4 GB on most 32-bit architectures (Intel x86, PowerPC) - - Manages virtual page → physical case mappings + - Manages virtual page → physical page mappings - Manages « swap » space to extend physical memory -MMU & Virtual Address Space ---------------------------- +.. note:: + + - la MMU est un composant hardware + +Reminder about MMU (1) +---------------------- + +- Here is a minimal code example: + + .. code-block:: sh + + # a program that takes x and y in memory, and + # computes the sum + mov %0x200000,eax # retrieve in eax + mov %0x200004,ebx # retrieve in ebx + add ebx,eax # compute x+y in eax + mov eax,%0x200008 # save the result in memory -.. figure:: mmu1.svg +- This program can run on one cpu +- If the addresses are physical, it is not possible to run multiple + instance of this program as they would modify the same memory -Intel x86 MMU -------------- +.. note:: + + - une mauvaise solution est de modifier le binaire à chaque execution + +Reminder about MMU (2) +---------------------- + +.. figure:: mmu-slide1.svg + :width: 70% + +Reminder about MMU (3) +---------------------- + +.. figure:: mmu-slide2.svg + :width: 95% + +Reminder about MMU (4): Intel x86 MMU +------------------------------------- .. figure:: mmu2.svg + :width: 100% Memory Virtualization (1) ------------------------- +.. figure:: mmu-slide3.svg + :width: 70% + +Memory Virtualization (2) +------------------------- + - Machine Physical Memory - Physical memory available on the machine @@ -564,7 +982,7 @@ Memory Virtualization (1) - Guest OS manages virtual address spaces of its processes -Memory Virtualization (2) +Memory Virtualization (3) ------------------------- - Guest OS manages Guest Physical Pages @@ -578,12 +996,52 @@ Memory Virtualization (2) - VMM dynamically translates Guest Physical Pages into Machine Physical Pages -Memory Virtualization (3) +Memory Virtualization (4) ------------------------- -.. figure:: mem-virt.svg +.. figure:: mmu-slide4.svg + :width: 95% + +.. note:: + + - passer en dynamique, expliquer comment sont fait les + translations, parler du tlb + montrer l'ordre chronologique des choses + - on dézoome un coup, en statique + virtual memory vs VM memory vs host physical memory + pas de mmu dans ce cas + - montrer en dynamique avec une seule MMU comment + l'hyperviseur configure la MMU + + utiliser les memes couleurs pour les types de memoire + on va détourner la mmu pour faire la translation qui nous + va bien + + - mettre un nombre dans CR3 + - mettre des barres verticales dans les page tables + - TLB sous la forme d'un tableau avec des lignes vides + - find -> get + - mmu plus large + - zoom sur les PTE à droite + - faire apparaitre les adresses + - dissocier les valeurs des adresses virtuelles et physiques, mettre + des couleurs différentes pour ces adresses + - voir si on ne peut pas faire apparaitre que les 20 bits significatifs + et pas les 12 bits d'offset qd on parle des adresses + + - Lorsque le guest accede à CR3 (ou un PTE), cela génère une faute, + gérée par le VMM. Le VMM va translater l'adresse donnée par l'OS de + la VM et remplir le registre CR3 avec l'adresse physique + correspondant à la zone utilisée par la VM pour y stocker ses tables + de pages. Il faut que tout accès à cette table de page génère une + faute pour que le VMM soit notifié de tout changement et puisse + configurer la MMU réelle en conséquence (en faisant la translation + d'adresse). (slide 47) + + - La lecture de CR3 ne génère pas de TRAP, il faut donc faire comme + pour les pushf et popf, c'est à dire de la translation de code. -Memory Virtualization (4) +Memory Virtualization (5) ------------------------- - VMM maintains Shadow Page Tables @@ -597,7 +1055,7 @@ Memory Virtualization (4) - Emulates operation in shadow page table - Updates effective MMU page table entry, if needed -Memory Virtualization (5) +Memory Virtualization (6) ------------------------- - PTE entries can be tagged with a context ID @@ -612,7 +1070,7 @@ Memory Virtualization (5) - VMM must flush TLB when switching VMs -Memory Virtualization (6) +Memory Virtualization (7) ------------------------- - VMM must respect Guest OS virtual page faults @@ -630,7 +1088,7 @@ Memory Virtualization (6) - Pages with same content's (e.g. zero-ed pages) -Memory Virtualization (7) +Memory Virtualization (8) ------------------------- - VMM can swap real pages of a VM @@ -647,67 +1105,63 @@ Memory Virtualization (7) - no more available for normal kernel allocation service - VMM assigns same amount of physical pages to other VM's -Paravirtualization -================== - -Paravirtualization (1) ----------------------- +.. note:: -- OS adaptation to avoid binary translation overhead -- Requires access to OS source code -- Include drivers of virtual devices -- Examples: + - ballooning: un module kernel est dans les guests, il communique + avec le VMM. Si le VMM a besoin de mémoire physique pour une + autre VM, il peut demander au module d'allouer de la mémoire, qui + est alors perdue pour les autres services. Cette mémoire est + "redonnée" au VMM. + - besoin de précisions et sources là dessus - - Xen - - User Mode Linux (UML) +System Level Virtualization: Paravirtualization +=============================================== -Paravirtualization (2) +CPU Paravirtualization ---------------------- -- Still run each Guest OS in non-privileged mode - -- But with minimal virtualization overhead - -- => Modified Guest OS kernel +- Still run each Guest OS in non-privileged mode, but with minimal + virtualization overhead - - Remove Hardware-Sensitive Instructions - - - Use fast VMM system calls instead, if needed +- OS adaptation to avoid binary translation overhead - - Minimise usage of Privileged Instructions + - Remove Hardware-Sensitive Instructions, use fast VMM system calls + - Minimize/avoid usage of Privileged Instructions - Only affect Machine/CPU dependant part of OS -- OS portage on new architecture with same CPU +- OS portage on new architecture with same CPU, without system ISA - - Without system ISA +- Examples: Xen legacy, User Mode Linux (UML), CoLinux -Paravirtualization (3) +I/O Paravirtualization ---------------------- -- Guest OS only use Virtual I/O Devices, in a cooperative way +- Multiplexing VMM physical devices among VMs - Front-end driver in Guest OS - Back-end driver in VMM + - Virtual ethernet, virtual disks -- VMM multiplex VM Virtual Devices on physical devices +- Fast virtual devices for VM to VM communications - - Virtual Ethernet - - Virtual Disks + - Example: vmxnet3 -- Data transfer through I/O rings +- Data transfer through syscalls, shmem, rings, ... +- Pros: scalability, VM migration Virtual I/O Devices ------------------- .. figure:: virt-devices.svg + :width: 100% -Paravirtualization Example: Xen -------------------------------- +Paravirtualization Example: Xen Legacy +-------------------------------------- - Objectives - - Scalable, support more than 100 VM + - Scalable - Share resources of Server machines - Intel IA-32, x86-64, ARM, ... @@ -718,8 +1172,12 @@ Paravirtualization Example: Xen - Have access (and manages) all physical devices - Modified version of Linux, FreeBSD -Hardware-Assisted Virtualization -================================ +.. note:: + + XXX vérifier le coup de domain 0 + +System Level Virtualization: Hardware-Assisted Virtualization +============================================================= Hardware Assisted Virtualization (1) ------------------------------------ @@ -750,7 +1208,7 @@ Hardware Assisted Virtualization (2) Hardware Assisted Virtualization (3) ------------------------------------ -- DMA virtualization +- Directed I/O virtualization - IO-MMU (Intel VT-d) @@ -781,6 +1239,7 @@ Intel VT-x Architecture Overview -------------------------------- .. figure:: vt-x.svg + :width: 100% Intel VT-x CPU Virtualization (1) --------------------------------- @@ -819,7 +1278,7 @@ Intel VT-x CPU Virtualization (2) - VM entries & VM exits use a new data structure - - Virtual Machine Control Structure (VMCS) per VM + - Virtual Machine Control Structure (VMCS) per VM CPU (vCPU) - Referenced with a memory physical address - Format and layout hidden - New VT-x instructions to access a VMCS @@ -830,19 +1289,69 @@ Intel VT-x CPU Virtualization (3) - Guest State Area - Saved value of registers before beeing changed by - - VM Exits (e.g., Segment Registers, CR3, IDTR) + VM Exits (e.g. Segment Registers, CR3, IDTR) -- Hidden CPU state (e.g., CPU Interruptibility State) + - Hidden CPU state (e.g., CPU Interruptibility State) - Host State Area - - VM Control Fields +- VM Control Fields + - Interrupt Virtualization - Exceptions bitmaps - I/O bitmaps - Model Specific Register R/W bitmaps - Execution rights for CPU Privileged Instructions +.. note:: + + - host state area est l'endroit ou l'état du processeur du VMM est + stocké. Il est restauré sur VMExit. + + - Switching from root mode to non-root mode is called "VM entry", the + switch back is "VM exit". The VMCS includes a guest and host state + area which is saved/restored at VM entry and exit. Most importantly, + the VMCS controls which guest operations will cause VM exits. + + The VMCS provides fairly fine-grained control over what the guests + can and can't do. For example, a hypervisor can allow a guest to + write certain bits in shadowed control registers, but not + others. This enables efficient virtualization in cases where guests + can be allowed to write control bits without disrupting the + hypervisor, while preventing them from altering control bits over + which the hypervisor needs to retain full control. The VMCS also + provides control over interrupt delivery and exceptions. + + Whenever an instruction or event causes a VM exit, the VMCS contains + information about the exit reason, often with accompanying + detail. For example, if a write to the CR0 register causes an exit, + the offending instruction is recorded, along with the fact that a + write access to a control register caused the exit, and information + about source and destination register. Thus the hypervisor can + efficiently handle the condition without needing advanced techniques + such as CSAM and PATM described above. + + VT-x inherently avoids several of the problems which software + virtualization faces. The guest has its own completely separate + address space not shared with the hypervisor, which eliminates + potential clashes. Additionally, guest OS kernel code runs at + privilege ring 0 in VMX non-root mode, obviating the problems by + running ring 0 code at less privileged levels. For example the + SYSENTER instruction can transition to ring 0 without causing + problems. Naturally, even at ring 0 in VMX non-root mode, any I/O + access by guest code still causes a VM exit, allowing for device + emulation. + + - Tout l'état du processur visible est sauvé dans ou restauré depuis + la memoire: + + - tous les registres, meme ceux de controle + - interruptability state + + - La VMCS contient ce que la VM a le droit de faire + + - IO bitmaps = bitfield qui dit quels ports IO (instructions in et + out) sont autorisés. Intel VT-x Interrupt Virtualization ----------------------------------- @@ -859,6 +1368,21 @@ Intel VT-x Interrupt Virtualization - Used by VMM to control VM interrupts +.. note:: + + - la window permet de délayer l'interruption hardware (et donc le vm + exit) tant que le guest n'a pas demasqué ses IT. + + - VT-x also includes an interrupt-window exiting VM-execution + control. When this control is set to 1, a VM exit occurs whenever + guest software is ready to receive interrupts. A VMM can set this + control when it has a virtual interrupt to deliver to a + guest. Similarly, VT-i includes a PAL service that a VMM can use to + register that it has a virtual interrupt pending. When guest + software is ready to receive such an interrupt, the service + transfers control to the VMM via the new virtual external interrupt + vector. + Intel VT-x MMU Virtualization ----------------------------- @@ -877,6 +1401,7 @@ Virtual Memory Virtualization ----------------------------- .. figure:: vt-x-mem.svg + :width: 100% Intel VT-x Extended Page Tables (1) ----------------------------------- @@ -899,11 +1424,31 @@ Intel VT-x Extended Page Tables (2) ----------------------------------- .. figure:: vt-x-mmu.svg + :width: 100% + +.. note:: + + - le TLB contient cache les 2 translations VA->GPA et GPA->MPA + + - There is only one downside: nested paging or EPT makes the virtual + to real physical address translation a lot more complex if the TLB + does not have the right entry. For each step we take in the blue + area, we need to do all the steps in the orange area. Thus, four + table searches in the "native situation" have become 16 searches + (for each of the four blue steps, four orange steps). + + http://www.anandtech.com/show/2480/10 TLB Flush Issue --------------- .. figure:: tlb-flush-issue.svg + :width: 100% + +.. note:: + + - 2 processes dans dess VMs différentes peuvent utiliser la même + adresse virtuelle Intel VT-x Virtual Processor Identifier --------------------------------------- @@ -922,32 +1467,45 @@ Intel VT-x Virtual Processor Identifier - VPID loaded from VMCS on VM Enter -DMA Virtualization (1) ----------------------- +.. note:: + + - faire la demo de Windows dans un KVM, on peut parcourir le + gestionnaire de périphérique pour voir que ce n'est pas du tout ce + que j'ai sur mon PC. En plus ça fait bien la transition avec la + virtualisation DMA. -- Enable Guest OS to manage I/O devices +.. Intel Virtualization Technology for Directed I/O + ================================================ - - I/O devices assigned by VMM to Guest OSes +Intel VT-d Principles +--------------------- -- Transparent mode +- Enable Guest OS to directly manage physical I/O devices - - Use native device driver of Guest OS - - Unaware of physical memory Virtualization + - Guest I/O operations bypass VMM -- Enforce isolation between Guest Oses +- In full transparent mode - - Guest OS only view hardware ressources assigned by VMM (memory, - devices) + - Use native device drivers of Guest OS + - Guest OS unaware of underlying physical memory virtualization -DMA Principles --------------- +- Enforce isolation between Guest VMs -.. figure:: dma.svg + - Guest OS can only access I/O ressources (ports, PCI devices) assigned to it + - PCI I/O device can only perform DMA to machine physical pages assigned to + Guest VM owning that device. -DMA Virtualization (2) ----------------------- +Intel Directed IO +----------------- .. figure:: dma-virt.svg + :width: 100% + +DMA Principles +-------------- + +.. figure:: dma.svg + :width: 100% DMA Virtualization Issue ------------------------ @@ -958,24 +1516,26 @@ DMA Virtualization Issue - Guest Physical Address must be translated into its corresponding Machine Physical Address when used for DMA operations by device -- GPA Translation cannot be done by VMM +- GPA -> MPA translation cannot be done by VMM - VMM cannot catch device-specific driver operations to setup I/O buffers addresses +- GPA -> MPA translation done by IOMMU on the Bus Controller + Intel VT-d Protection Domains ----------------------------- -- Intel VT-d provides DMA Protection Domains +- Intel VT-d provides DMA Protection Domain - Extension of IOMMU translation mechanism - - Isolated context of a subset of the Machine Physical Memory (MPA) - - Correspond to the portion of Machine Physical Memory allocated to + - Isolated context of a subset of the Machine Physical Memory + - Corresponds to the portion of Machine Physical Memory allocated to a VM -- I/O devices assigned by VMM to a DMA Protection Domain +- I/O devices associated by VMM with a DMA Protection Domain - - Achieves DMA isolation by restricting memory view of I/O devices + - Achieves DMA isolation by restricting memory access of I/O devices through DMA address translation Intel VT-d DMA Translation @@ -998,6 +1558,7 @@ VT-d PCI Express North Bridge ----------------------------- .. figure:: vt-d.svg + :width: 100% PCI DMA Requester Identification -------------------------------- @@ -1006,6 +1567,7 @@ PCI DMA Requester Identification - 16-bit PCI DMA Requester Identifier .. figure:: dma-req-id.svg + :width: 80% - Assigned by PCI configuration software - Bus # indexes Bus Context Table in Root Context Table @@ -1016,11 +1578,12 @@ Device / Protection Domain Mapping ---------------------------------- .. figure:: device-domain-mapping.svg + :width: 100% Virtual DMA Address Translation ------------------------------- -- VDA ↔ MPA VT-d Page Tables similar to IA-32 processor Page Tables +- DVA ↔ MPA Page Tables similar to IA-32 processor Page Tables - 4KB or larger page size granularity @@ -1031,27 +1594,45 @@ Virtual DMA Address Translation - Initialized at VM creation time - With same translations of the VM Extended Page Table -Device Virtualization ---------------------- +VMM and Directed I/O +-------------------- + +- Unplugs assigned PCI device from VMM driver and reset it + +- Associates PCI device with VT-d Protection Domain of the Guest VM + +- Maps device memory BARs in Guest VM physical space + +- Arranges for OS of Guest VM to probe PCI device(s) assigned to it + +- Handles device interrupts and redirect them to Guest VM + +- Reset assigned PCI device upon Guest VM shut down -- Share I/O device among multiple VMs +.. Device Virtualization + ===================== + +Device Virtualization Principles +-------------------------------- + +- Share I/O device among multiple Guest VMs - With no performance lost - While enforcing VM isolation and protection - Move device virtualization from the VMM to the device itself -- Requires support from the device +- PCIe extension -- Example of Ethernet controllers +- PF/VF requires support from the device -Ethernet Device Virtualization ------------------------------- +Ethernet Device Virtual Functions +--------------------------------- .. figure:: ethernet-dev-virt.svg -Intel Single Root I/O Virtualization ------------------------------------- +Single Root I/O Virtualization +------------------------------ - SR-IOV capable PCI Device can be partitionned into multiple Virtual Functions @@ -1059,300 +1640,69 @@ Intel Single Root I/O Virtualization - SR-IOV Device appears in PCI configuration space as multiple PCI Virtual Functions -- Each Device Virtual Function includes +- Virtual Functions are "lightweight" PCI functions including - - PCI configuration registers + - PCI probing capabilities - DMA streams - Interrupts - Requires VT-d for DMA virtualization -Intel SR-IOV (1) ----------------- - -- VMM manages physical PCI device - -- Create a PCI Virtual Function for each VM - - - Include it into VM PCI configuration space to be probed by VM - GuestOS kernel - - Map it to Protection Domain of VM - -- Programs the sharing of physical devices ressources between VFs - -- PCI Device Virtual Functions directly managed by specific VF-Aware - GuestOS drivers (kind of Para-Virtualization) - -Intel SR-IOV (2) ----------------- - -.. figure:: eth-sr-iov.svg - :width: 80% - -Intel SR-IOV - Ethernet example -------------------------------- - -- Intel Kawela (1GB) / Niantic (10GB) Ethernet NICs - - - Multiple RX/TX packet queues per port - -- Virtual Device Machine Queues - - - 1 RX paquet queue per VF - -- Filters multiple unicast Ethernet Addresses - -- Layer-2 paquet filtering based on Ethernet Destination Address - -- Duplicate Broadcast / Multicast packets for all VFs - -- Load balancing between TX paquets sent by VFs - -Virtualization and Embedded Systems -=================================== - -Old Embedded Systems (1) ------------------------- - -- Relatively simple architecture - -- Single-purpose devices - -- Dominated by hardware constraints - - - Memory, battery charge - -- Dedicated functionalities, with moderated software size and - complexity - -- Real-time constraints - -Old Embedded Systems (2) ------------------------- - -- Closed environment (« black boxes ») - -- Fixed hardware configuration - -- Full software provided by device vendor - -- No dynamic loading of applications - -- Software updates rareful - -Embedded Systems Now (1) ------------------------- - -- Take on features of general-purpose OS's +- Virtual Functions have no configuration resources -- Growing functionalities => growing complexity and size - -- Run applications originally developed for PC's - - - Sophisticated Human Machine Interfaces (HMI) - - Safari Web browser on iPhones - -- Dynamic loading of applications - - - Iphone - - Google Android - -Embedded Systems Now (2) +SR-IOV Device Management ------------------------ -- Dynamically load device's owner specific applications - - - Games - -- Applications developped by engineers with no expertise - in embedded systems - - - Java applications - -- Need for exchanges with external world - - - USB, Bluetooth, Wi-Fi - - TCP/IP - -- Need for open API's, and openness in general - -- Need for high-level systems (Linux, Windows) - -Embedded Systems Challenges ---------------------------- - -- Still Real-Time systems (part of it) - - - Baseband stack of mobile phones - -- Still hardware constraints - - - Battery - - Memory (to minimize device's cost) - -- Also used in mission/life critical situations - - - Weapons - - Cars +- VMM manages the physical PCI device -- High requirements on reliability and security - -Mobile Handsets ---------------- - -XXX - -- Run Android/Linux applications on baseband processor - -- Re-use existing legacy modem software stack with its RTOS (no - changes) - -- Support of Linux at a minimal development cost - -- Operating System independence for future evolutions - -- Security & Protection through OS isolation - -:: - - HMI: Human-Machine-Interface - PIM: Personal Information - -Virtualization in Embedded Systems (1) --------------------------------------- - -- Support for heterogeneous OS's environments - -- Real-time OS - - - Legacy software - - Dedicated applications whose real-time constraints cannot be - achieved by General-Purpose systems - - Licence issues (« GPL contamination ») - -- General Purpose OS - - - Openness - - HMI - -Virtualization in Embedded Systems (2) --------------------------------------- +- VMM creates a PCI Virtual Function for each VM -- Concurrent execution of RTOS and GP-OS on the same CPU + - Includes it into VM PCI configuration space to be probed by OS kernel + of Guest VM + - Associates VF with VT-d Protection Domain of the Guest VM -- Reduces cost (Bill Of Material) +- VMM programs the sharing of physical devices ressources between VFs -- Requires the underlying VMM to provide +- Virtual Functions managed by specific VF-aware drivers in kernel of + Guest OS (kind of Para-Virtualization) - - Memory isolation between OS's - - CPU scheduling among OS's, with higher priority to the RTOS - - Device partitionning - - Communication mechanism between OS's - -Virtualization in Embedded Systems (3) --------------------------------------- - -- Leverage multi-cores support with virtual machine abstraction - -- 1 core per OS => no need for CPU scheduling - -- 2 low-performance cores consume less power than a single high - performance CPU => simplify power management - -- New model of software distribution, shipping application with its own OS - - - No OS configuration/version incoherency - -Security Through Virtualization -------------------------------- - -- Notion of Trusted Computing Base (TCB) - - - Part of the system that provides security foundations - - Should only include hardware and VMM - - May also include RTOS, for performance/legacy reasons - -- Run GP OS in an isolated Virtual Machine - - - Avoid damaged GP OS to compromise the secure parts (data, - services) of the system - -Embedded + Virtualization Challenges (1) ----------------------------------------- - -- Full isolation of VM's does not fit cooperation requirements between OS's - -- Efficient communication mechanisms between VM's - -- Global scheduling, with interleaved priorities - -- Global Energy Management - -Embedded + Virtualization Challenges (2) ----------------------------------------- - -- Efficient communication mechanisms between VM's - - - Virtual Ethernet device not adapted - - Need VMM-controlled shared memory transfers - -- Example: Video streaming on a Smartphone - - - Video data received via the baseband managed by RTOS - - Video data displayed by a Media Player running on GPOS - - Avoid copy of video data transfered between the 2 OS's ! - -Task Scheduling Issues ----------------------- +Intel Niantic Virtual Functions (1) +----------------------------------- -- Standard server-oriented Virtualization model +.. figure:: eth-sr-iov.svg + :width: 80% - - The VMM schedules VM's on the CPU - - The OS on each VM runs its own scheduler +Intel Niantic Virtual Functions (2) +----------------------------------- -- Interleaved priorities in Embedded Systems +- Virtual Devices on Intel Niantic (10GB) NICs - - Baseband task of RTOS with a high priority - - But GPOS Media-Player must have a higher priority than some - low-priority tasks of RTOS - - Enable a VM to yield the CPU +- Layer-2 packet filtering based on destination MAC address - - Use a RT task as a proxy of GP OS application, and make it yield - the CPU +- Filters multiple unicast MAC addresses / VLAN identifiers -Multi-Users Devices -------------------- +- Can duplicate Broadcast / Multicast packets for all VFs -- Mobile phone has 3 types of users, each with specific private data - to protect from the others +- Multiple RX queues per VF (RSS) - - The person owning the device, with address book, emails, - documents, etc. - - Different wireless providers, for example private and - professionnal: network access properly authenticated, ensure - correct billing ! - - Third-party service providers, for instance multimedia providers. +- Load balancing between TX packets sent by VFs -- Owner and third-parties must be granted secure financial - transactions +- Anti-Spoofing mechanism on transmission -Virtualization in Hardware --------------------------- + - Source MAC address + - VLAN identifier -- Only way to build a real TCB +Pro/Cons of I/O hardware virtualization +--------------------------------------- - - Without penalizing performances +- Improves I/O performances on physical devices directly managed by Guest VMs -- Should include support for +- Only useful in specific configurations - - Memory Partitionning - - Physical Memory / Machine Memory mapping - - Coupled with multi-cores - - Device Partitioning +- PCI device Virtual Functions intended to scale, but require locking + of total VM physical memory into machine physical memory - - Interrupt routing - - I/O DMA coupled with memory partitioning & Physical Memory / - Machine Memory mapping +- Not compatible with transparent VM migration Conclusion / Evolution of Virtualization ======================================== @@ -1365,9 +1715,13 @@ Conclusion - Accelerated emulation : faster, code is executed natively, overhead for privilegied actions - Virtual servers : fast and scalable, but same OS and one kernel -- Paravirtualization : fast, needs a modified OS +- Paravirtualization : fast, needs a modified OS (or drivers) - HW-assisted virtualization : solves most of the issues +.. note:: + + - needs a modified OS is not true for devices + Evolutions of Virtualization ---------------------------- @@ -1383,3 +1737,8 @@ Evolutions of Virtualization - Virtualization on desktops and small devices - Security (isolates work and personal area) + +Thanks +------ + +- Any question ? diff --git a/java.png b/java.png new file mode 100644 index 0000000..ca0872d Binary files /dev/null and b/java.png differ diff --git a/mmu-slide1.svg b/mmu-slide1.svg new file mode 100644 index 0000000..5f42cf0 --- /dev/null +++ b/mmu-slide1.svgimage/svg+xml + + + + + + + + + + + + + process 1virtualmemory + process 2virtualmemory + machinephysicalmemory + + + + + + + + + + + + + + + + + 0 + 0 + 0 + 0xFFFFFFFF + 0xFFFFFFFF + 0xFFFFFFFF + MMU + + + + CR3 + + OS kernel + + page tables + page tables + + diff --git a/mmu-slide2.svg b/mmu-slide2.svg new file mode 100644 index 0000000..fad204f --- /dev/null +++ b/mmu-slide2.svgimage/svg+xml + + + + + + + + + + process 1virtualmemory + + + + + + + + + 0 + 0 + 0 + 0xFFFFFFFF + MMU + + + + CR3 + + + TLB + + + 0x12345678 + + 1) no entryin TLB + 2) get page tabledirectory of thisprocess + 3) get pageindex + 0x12345678 ->0001001000 1101000101 0110011110000x48 0x245 0x476 + + 0x245 + + 0x476 + + + + + + 4) get page + + 5) update TLB + 0x40000 + + + Virt addr + + + Phys addr + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0x243 + 0x244 + 0x246 + 0x49 + + + + + + + + + + + + + + + + + + 0x47 + 0x48 + 0x50 + 0x300000 + 0x301000 + 0x12345000 + 0x1000 + 0x50000 + 0x40000 + 0x300000 + 0x301000 + 0x50000 + 0x1000 + + diff --git a/mmu-slide3.svg b/mmu-slide3.svg new file mode 100644 index 0000000..7669c99 --- /dev/null +++ b/mmu-slide3.svgimage/svg+xml + + + + + + + + + + + + + process 1virtualmemory + process 2virtualmemory + + + + + + + + + + + + + + 0 + + machinephysicalmemory + + + + + + + process 1virtualmemory + process 2virtualmemory + + + + + + + + + + + + + + + + 0 + + + + + + + + + + + VM 1memory + VM 2memory + + + + + + + + + + + + diff --git a/mmu-slide4.svg b/mmu-slide4.svg new file mode 100644 index 0000000..0652737 --- /dev/null +++ b/mmu-slide4.svgimage/svg+xml + + + + + + + + + process 1of VM1virtualmemory + + + + + + + + 0 + 0xFFFFFFFF + MMU + + + + CR3 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0x300000 + 0x301000 + 0x12345000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + machinephysicalmemory + guestphysicalmemory + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Guest OSwrites0x1000000(trappedby VMM) + 0x1000000 + 0x40000 + pagetablesused byMMU + + + + shadowpagetables + + + + TLB + + + Virt addr + + + Phys addr + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0x1000 + 0x50000 + + + + + 0x40000 + 0x300000 + 0x301000 + 0x50000 + 0x1000 + + diff --git a/mmu2.svg b/mmu2.svg index 75a627b..c316aef 100644 --- a/mmu2.svg +++ b/mmu2.svg @@ -30,7 +30,7 @@ guidetolerance="10" inkscape:pageopacity="0" inkscape:pageshadow="2" - inkscape:window-width="1918" + inkscape:window-width="1433" inkscape:window-height="1059" id="namedview18523" showgrid="false" @@ -38,9 +38,9 @@ fit-margin-left="20" fit-margin-right="20" fit-margin-bottom="20" - inkscape:zoom="0.95353535" - inkscape:cx="319.76331" - inkscape:cy="312.7094" + inkscape:zoom="0.67425131" + inkscape:cx="575.75808" + inkscape:cy="254.2245" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -107,224 +107,19 @@ id="path17832" inkscape:connector-curvature="0" />0 -22 -31 -21 -12 -11 -DirectoryIndex -Page TableIndex -PageOffset -cr/st -1023 -cr/st -0 -1023 -VirtualAddress -Directory Page -CR3 -Directory Address -Physical Memory -cr/st = control & status -Page Table Entry (PTE) -cr/st -4KB page -Translation Lookaside Buffer (TLB) = cache for PTEs -0 -VirtualAddress + +Translation Lookaside Buffer (TLB)Cache for PTEs +cr/st = control &status +PhysicalMemory +cr/st +cr/st +1023 +1023 +0 +0 +10 bits +12 bits +10 bits +Page Table Entry (PTE) +cr/st +32 bits word +4KB page +DirectoryAddress +DirectoryPage +DirectoryIndex +PageTableIndex +PageOffset +CR3 +31 +22 +10 bits + id="tspan17479" + x="9369.998" + y="974.37146">21 10 bits + id="tspan17483" + x="13622.629" + y="974.37146">12 12 bits + id="tspan17487" + x="14473.156" + y="971.76697">11 32 bits word + id="tspan17491" + x="20253.684" + y="971.57404">0 \ No newline at end of file diff --git a/standalone.svg b/standalone.svg index 6a5e237..3a5f30b 100644 --- a/standalone.svg +++ b/standalone.svg @@ -38,9 +38,9 @@ fit-margin-left="20" fit-margin-right="20" fit-margin-bottom="20" - inkscape:zoom="0.47676768" - inkscape:cx="203.43084" - inkscape:cy="-3.1943552" + inkscape:zoom="0.95353536" + inkscape:cx="452.625" + inkscape:cy="179.86817" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -111,203 +111,35 @@ d="m 11414.444,10633.555 -5712.9996,0 0,-1827.9996 11424.9996,0 0,1827.9996 -5712,0 z" id="path4047" inkscape:connector-curvature="0" - style="fill:#c0c0c0;stroke:none" />Hardware -VMM -Guest OS -Applications -Guest OS -Applications -Guest OS -Applications - \ No newline at end of file + style="fill:none;stroke:#ff0000;stroke-width:102" />Hardware +Applications +Applications +Applications +Guest OS +Guest OS +Guest OS +VMM + \ No newline at end of file diff --git a/virtual-servers.svg b/virtual-servers.svg index 83554f2..f967312 100644 --- a/virtual-servers.svg +++ b/virtual-servers.svg @@ -31,13 +31,13 @@ guidetolerance="10" inkscape:pageopacity="0" inkscape:pageshadow="2" - inkscape:window-width="1918" + inkscape:window-width="1465" inkscape:window-height="1059" id="namedview6875" showgrid="false" inkscape:zoom="1.3485026" - inkscape:cx="297.93333" - inkscape:cy="155.67476" + inkscape:cx="411.02173" + inkscape:cy="216.48295" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -207,24 +207,7 @@ d="m 17392.444,11801.444 0,0 z" id="path5974" inkscape:connector-curvature="0" - style="fill:none;stroke:#3465af;stroke-width:28.22200012" />Kernel Code -P7 -P6 -10.16.0.0/16 -/roots/vm2 -10.17.0.0/16 -/roots/vm3 -10.18.0.0/16 -P3 -P8 -/ -74.125.0.0/16 -P1 -P2 -P5 + xml:space="preserve" + style="font-size:395.1111145px;font-style:normal;font-weight:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans" + x="7694.4717" + y="1877.7703" + id="text5607" + sodipodi:linespacing="125%">P3 +P1 +P2 +P9 +P8 +P5 +P6 +P7 +74.125.0.0/16 +10.16.0.0/16 +10.17.0.0/16 +10.18.0.0/16 +/ +/roots/vm1 +/roots/vm2 P9 + xml:space="preserve" + style="font-size:395.1111145px;font-style:normal;font-weight:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;fill-rule:evenodd;stroke:none;font-family:Sans" + x="19087.771" + y="8571.1357" + id="text5730-7" + sodipodi:linespacing="125%">/roots/vm3 /roots/vm1 + xml:space="preserve" + style="font-size:395.1111145px;font-style:normal;font-weight:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans" + x="11232.069" + y="11016.243" + id="text5764" + sodipodi:linespacing="125%">Kernel Code \ No newline at end of file diff --git a/vmware-esx.svg b/vmware-esx.svg new file mode 100644 index 0000000..bcfde84 --- /dev/null +++ b/vmware-esx.svg @@ -0,0 +1,2047 @@ + + + + + + + + image/svg+xml + + + + + + + + + diff --git a/vmware-wks.png b/vmware-wks.png index 44ffead..da8497b 100644 Binary files a/vmware-wks.png and b/vmware-wks.png differ diff --git a/vmware-wks.svg b/vmware-wks.svg new file mode 100644 index 0000000..3f6f991 --- /dev/null +++ b/vmware-wks.svg @@ -0,0 +1,891 @@ + + + + + + + + image/svg+xml + + + + + + + + + diff --git a/vt-d.svg b/vt-d.svg index 0a51df9..5800c95 100644 --- a/vt-d.svg +++ b/vt-d.svg @@ -30,13 +30,13 @@ guidetolerance="10" inkscape:pageopacity="0" inkscape:pageshadow="2" - inkscape:window-width="1918" + inkscape:window-width="1433" inkscape:window-height="1059" id="namedview10451" showgrid="false" - inkscape:zoom="1.3485026" - inkscape:cx="235.55156" - inkscape:cy="197.43026" + inkscape:zoom="0.95353533" + inkscape:cx="444.99945" + inkscape:cy="319.33718" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -273,124 +273,124 @@ inkscape:connector-curvature="0" style="fill:none;stroke:#3465af;stroke-width:28.22200012" />CPU + id="tspan17721" + x="1561.229" + y="2512.8813">CPU SystemBUS + id="tspan17725" + x="4248.4985" + y="1994.0425">System BUS NorthNorthBridge + x="8162.1401" + y="1571.511" + id="tspan17731">Bridge Memory + id="tspan17735" + x="8290.4541" + y="2383.854">VT-d Device 1 + id="tspan17739" + x="8214.4893" + y="3855.6411">PCIe rootports Device 2 + id="tspan17745" + x="10052.053" + y="6096.041">PCI ExpressBus Device 3 + id="tspan17751" + x="4810.4941" + y="8215.3535">Device 1 PCI Express Bus + id="tspan17751-9" + x="9370.5371" + y="8215.3535">Device 2 VT-d + id="tspan17751-0" + x="13924.082" + y="8214.8535">Device 3 PCIe rootports + id="tspan17785" + x="14238.55" + y="1983.9811">Memory \ No newline at end of file diff --git a/vt-x.svg b/vt-x.svg index 10951f1..f88097b 100644 --- a/vt-x.svg +++ b/vt-x.svg @@ -35,8 +35,8 @@ id="namedview14829" showgrid="false" inkscape:zoom="1.3485026" - inkscape:cx="289.49673" - inkscape:cy="131.98018" + inkscape:cx="199.39679" + inkscape:cy="136.85033" inkscape:window-x="0" inkscape:window-y="19" inkscape:window-maximized="0" @@ -120,7 +120,7 @@ id="path14392" inkscape:connector-curvature="0" style="fill:none;stroke:#3465af;stroke-width:28.22200012" />VMM + style="font-size:395.1111145px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;text-anchor:middle;fill:#000000;stroke:none;font-family:Sans;-inkscape-font-specification:Sans">VMM VMX root mode -VMX non-root mode -Intel-VT Hardware -VM Exit -VM Enter + style="font-size:395.1111145px;font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;text-align:center;text-anchor:middle;fill:#000000;stroke:none;font-family:Sans;-inkscape-font-specification:Sans">Intel-VT Hardware rings 0 - 3 + xml:space="preserve" + style="font-size:395.1111145px;font-style:normal;font-weight:normal;text-align:center;line-height:125%;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans" + x="4686.6436" + y="2253.7368" + id="text7538" + sodipodi:linespacing="125%">Applications ring 3 + id="tspan7540-7" + x="10733.644" + y="2253.7368">Applications ring 3 + id="tspan7540-9" + x="16757.643" + y="2253.7368">Applications ring 3 + id="tspan7574" + x="2294.3784" + y="787.48706">VM 1 ring 0 + id="tspan7578" + x="8325.3789" + y="790.09155">VM 2 ring 0 + id="tspan7582" + x="14357.379" + y="787.29413">VM 3 ring 0 + id="tspan7586" + x="2151.7495" + y="3093.7961">ring 3 Applications + id="tspan7590" + x="2151.7495" + y="3801.0078">ring 0 Applications + id="tspan7594" + x="3118.3557" + y="10205.147">ring 0-3 Applications + id="tspan7586-0" + x="8191.7495" + y="3093.7961">ring 3 ring 3 +ring 0 +ring 0 +Guest OSkernel + id="tspan7660">kernel Guest OSkernel + id="tspan7660-3">kernel Guest OSkernel + id="tspan7660-39">kernel +VM non-root mode VM 1 + id="tspan7703" + x="2175.04" + y="7965.7915">VM root mode VM 2 + id="tspan7707" + x="7345.9258" + y="6814.7207">VM Exit VM 3 + id="tspan7711" + x="12578.066" + y="6805.8462">VM Enter \ No newline at end of file