2.6.0-test10-mm1 --- diff/Documentation/filesystems/Locking 2003-08-26 10:00:51.000000000 +0100 +++ source/Documentation/filesystems/Locking 2003-11-26 10:09:05.000000000 +0000 @@ -420,7 +420,7 @@ prototypes: void (*open)(struct vm_area_struct*); void (*close)(struct vm_area_struct*); - struct page *(*nopage)(struct vm_area_struct*, unsigned long, int); + struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *); locking rules: BKL mmap_sem --- diff/Documentation/filesystems/proc.txt 2003-09-17 12:28:01.000000000 +0100 +++ source/Documentation/filesystems/proc.txt 2003-11-26 10:09:05.000000000 +0000 @@ -900,6 +900,15 @@ Every mounted file system needs a super block, so if you plan to mount lots of file systems, you may want to increase these numbers. +aio-nr and aio-max-nr +--------------------- + +aio-nr is the running total of the number of events specified on the +io_setup system call for all currently active aio contexts. If aio-nr +reaches aio-max-nr then io_setup will fail with EAGAIN. Note that +raising aio-max-nr does not result in the pre-allocation or re-sizing +of any kernel data structures. + 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats ----------------------------------------------------------- --- diff/Documentation/kernel-parameters.txt 2003-10-09 09:47:33.000000000 +0100 +++ source/Documentation/kernel-parameters.txt 2003-11-26 10:09:05.000000000 +0000 @@ -24,6 +24,7 @@ HW Appropriate hardware is enabled. IA-32 IA-32 aka i386 architecture is enabled. IA-64 IA-64 architecture is enabled. + IOSCHED More than one I/O scheduler is enabled. IP_PNP IP DCHP, BOOTP, or RARP is enabled. ISAPNP ISA PnP code is enabled. ISDN Appropriate ISDN support is enabled. @@ -91,6 +92,11 @@ ht -- run only enough ACPI to enable Hyper Threading See also Documentation/pm.txt. + acpi_pic_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode + Format: { level | edge } + level Force PIC-mode SCI to Level Trigger (default) + edge Force PIC-mode SCI to Edge Trigge + ad1816= [HW,OSS] Format: ,,, See also Documentation/sound/oss/AD1816. @@ -214,7 +220,7 @@ Forces specified timesource (if avaliable) to be used when calculating gettimeofday(). If specicified timesource is not avalible, it defaults to PIT. - Format: { pit | tsc | cyclone | ... } + Format: { pit | tsc | cyclone | pmtmr } hpet= [IA-32,HPET] option to disable HPET and use PIT. Format: disable @@ -303,6 +309,10 @@ See comment before function elanfreq_setup() in arch/i386/kernel/cpu/cpufreq/elanfreq.c. + elevator= [IOSCHED] + Format: {"as"|"cfq"|"deadline"|"noop"} + See Documentation/as-iosched.txt for details + es1370= [HW,OSS] Format: [,] See also header of sound/oss/es1370.c. --- diff/Documentation/kobject.txt 2003-09-17 12:28:01.000000000 +0100 +++ source/Documentation/kobject.txt 2003-11-26 10:09:05.000000000 +0000 @@ -2,7 +2,7 @@ Patrick Mochel -Updated: 3 June 2003 +Updated: 12 November 2003 Copyright (c) 2003 Patrick Mochel @@ -128,7 +128,33 @@ (like the networking layer). -1.4 sysfs +1.4 Order dependencies + +Fields in a kobject must be initialized before they are used, as +indicated in this table: + + k_name Before kobject_add() (1) + name Before kobject_add() (1) + refcount Before kobject_init() (2) + entry Set by kobject_init() + parent Before kobject_add() (3) + kset Before kobject_init() (4) + ktype Before kobject_add() (4) + dentry Set by kobject_add() + +(1) If k_name isn't already set when kobject_add() is called, +it will be set to name. + +(2) Although kobject_init() sets refcount to 1, a warning will be logged +if refcount is not equal to 0 beforehand. + +(3) If parent isn't already set when kobject_add() is called, +it will be set to kset's embedded kobject. + +(4) kset and ktype are optional. If they are used, they should be set +at the times indicated. + +1.5 sysfs Each kobject receives a directory in sysfs. This directory is created under the kobject's parent directory. @@ -210,7 +236,25 @@ name. The kobject, if found, is returned. -2.3 sysfs +2.3 Order dependencies + +Fields in a kset must be initialized before they are used, as indicated +in this table: + + subsys Before kset_add() + ktype Before kset_add() (1) + list Set by kset_init() + kobj Before kset_init() (2) + hotplug_ops Before kset_add() (1) + +(1) ktype and hotplug_ops are optional. If they are used, they should +be set at the times indicated. + +(2) kobj is passed to kobject_init() during kset_init() and to +kobject_add() during kset_add(); it must initialized accordingly. + + +2.4 sysfs ksets are represented in sysfs when their embedded kobjects are registered. They follow the same rules of parenting, with one @@ -352,7 +396,21 @@ - Sets obj->subsys.kset.kobj.kset to the subsystem's embedded kset. -4.4 sysfs +4.4 Order dependencies + +Fields in a subsystem must be initialized before they are used, +as indicated in this table: + + kset Before subsystem_register() (1) + rwsem Set by subsystem_register() + +(1) kset is minimally initialized by the decl_subsys macro. It is +passed to kset_init() and kset_add() by subsystem_register(). If its +subsys member isn't already set, subsystem_register() sets it to the +containing subsystem. + + +4.5 sysfs subsystems are represented in sysfs via their embedded kobjects. They follow the same rules as previously mentioned with no exceptions. They --- diff/Documentation/scsi/BusLogic.txt 2003-08-20 14:16:23.000000000 +0100 +++ source/Documentation/scsi/BusLogic.txt 2003-11-26 10:09:05.000000000 +0000 @@ -577,7 +577,7 @@ INSMOD Loadable Kernel Module Installation Facility: insmod BusLogic.o \ - 'BusLogic_Options="QueueDepth:[,7,15];QueueDepth:31,BusSettleTime:30"' + 'BusLogic="QueueDepth:[,7,15];QueueDepth:31,BusSettleTime:30"' NOTE: Module Utilities 2.1.71 or later is required for correct parsing of driver options containing commas. --- diff/Documentation/sysctl/fs.txt 2003-01-02 10:43:02.000000000 +0000 +++ source/Documentation/sysctl/fs.txt 2003-11-26 10:09:05.000000000 +0000 @@ -138,3 +138,13 @@ can have. You only need to increase super-max if you need to mount more filesystems than the current value in super-max allows you to. + +============================================================== + +aio-nr & aio-max-nr: + +aio-nr shows the current system-wide number of asynchronous io +requests. aio-max-nr allows you to change the maximum value +aio-nr can grow to. + +============================================================== --- diff/MAINTAINERS 2003-11-25 15:24:57.000000000 +0000 +++ source/MAINTAINERS 2003-11-26 10:09:08.000000000 +0000 @@ -1154,6 +1154,12 @@ W: http://developer.osdl.org/rddunlap/kj-patches/ S: Maintained +KGDB FOR I386 PLATFORM +P: George Anzinger +M: george@mvista.com +L: linux-net@vger.kernel.org +S: Supported + KERNEL NFSD P: Neil Brown M: neilb@cse.unsw.edu.au @@ -1234,8 +1240,8 @@ S: Maintained LSILOGIC/SYMBIOS/NCR 53C8XX and 53C1010 PCI-SCSI drivers -P: Gerard Roudier -M: groudier@free.fr +P: Matthew Wilcox +M: matthew@wil.cx L: linux-scsi@vger.kernel.org S: Maintained --- diff/Makefile 2003-11-25 15:24:57.000000000 +0000 +++ source/Makefile 2003-11-26 10:09:08.000000000 +0000 @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 0 -EXTRAVERSION = -test10 +EXTRAVERSION = -test10-mm1 # *DOCUMENTATION* # To see a list of typical targets execute "make help" @@ -275,7 +275,7 @@ CPPFLAGS := -D__KERNEL__ -Iinclude \ $(if $(KBUILD_SRC),-Iinclude2 -I$(srctree)/include) -CFLAGS := -Wall -Wstrict-prototypes -Wno-trigraphs -O2 \ +CFLAGS := -Wall -Wstrict-prototypes -Wno-trigraphs \ -fno-strict-aliasing -fno-common AFLAGS := -D__ASSEMBLY__ @@ -431,6 +431,12 @@ # --------------------------------------------------------------------------- +ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE +CFLAGS += -Os +else +CFLAGS += -O2 +endif + ifndef CONFIG_FRAME_POINTER CFLAGS += -fomit-frame-pointer endif --- diff/arch/alpha/kernel/irq.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/alpha/kernel/irq.c 2003-11-26 10:09:04.000000000 +0000 @@ -529,19 +529,21 @@ #ifdef CONFIG_SMP int j; #endif - int i; + int i = *(int *) v; struct irqaction * action; unsigned long flags; #ifdef CONFIG_SMP - seq_puts(p, " "); - for (i = 0; i < NR_CPUS; i++) - if (cpu_online(i)) - seq_printf(p, "CPU%d ", i); - seq_putc(p, '\n'); + if (i == 0) { + seq_puts(p, " "); + for (i = 0; i < NR_CPUS; i++) + if (cpu_online(i)) + seq_printf(p, "CPU%d ", i); + seq_putc(p, '\n'); + } #endif - for (i = 0; i < ACTUAL_NR_IRQS; i++) { + if (i < ACTUAL_NR_IRQS) { spin_lock_irqsave(&irq_desc[i].lock, flags); action = irq_desc[i].action; if (!action) @@ -568,15 +570,16 @@ seq_putc(p, '\n'); unlock: spin_unlock_irqrestore(&irq_desc[i].lock, flags); - } + } else if (i == ACTUAL_NR_IRQS) { #ifdef CONFIG_SMP - seq_puts(p, "IPI: "); - for (i = 0; i < NR_CPUS; i++) - if (cpu_online(i)) - seq_printf(p, "%10lu ", cpu_data[i].ipi_count); - seq_putc(p, '\n'); + seq_puts(p, "IPI: "); + for (i = 0; i < NR_CPUS; i++) + if (cpu_online(i)) + seq_printf(p, "%10lu ", cpu_data[i].ipi_count); + seq_putc(p, '\n'); #endif - seq_printf(p, "ERR: %10lu\n", irq_err_count); + seq_printf(p, "ERR: %10lu\n", irq_err_count); + } return 0; } --- diff/arch/alpha/kernel/traps.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/alpha/kernel/traps.c 2003-11-26 10:09:04.000000000 +0000 @@ -636,6 +636,7 @@ lock_kernel(); printk("Bad unaligned kernel access at %016lx: %p %lx %ld\n", pc, va, opcode, reg); + dump_stack(); do_exit(SIGSEGV); got_exception: --- diff/arch/arm/Makefile 2003-10-27 09:20:36.000000000 +0000 +++ source/arch/arm/Makefile 2003-11-26 10:09:04.000000000 +0000 @@ -14,8 +14,6 @@ GZFLAGS :=-9 #CFLAGS +=-pipe -CFLAGS :=$(CFLAGS:-O2=-Os) - ifeq ($(CONFIG_FRAME_POINTER),y) CFLAGS +=-fno-omit-frame-pointer -mapcs -mno-sched-prolog endif --- diff/arch/arm/kernel/irq.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/arm/kernel/irq.c 2003-11-26 10:09:04.000000000 +0000 @@ -169,11 +169,11 @@ int show_interrupts(struct seq_file *p, void *v) { - int i; + int i = *(int *) v; struct irqaction * action; unsigned long flags; - for (i = 0 ; i < NR_IRQS ; i++) { + if (i < NR_IRQS) { spin_lock_irqsave(&irq_controller_lock, flags); action = irq_desc[i].action; if (!action) @@ -187,12 +187,12 @@ seq_putc(p, '\n'); unlock: spin_unlock_irqrestore(&irq_controller_lock, flags); - } - + } else if (i == NR_IRQS) { #ifdef CONFIG_ARCH_ACORN - show_fiq_list(p, v); + show_fiq_list(p, v); #endif - seq_printf(p, "Err: %10lu\n", irq_err_count); + seq_printf(p, "Err: %10lu\n", irq_err_count); + } return 0; } --- diff/arch/arm26/kernel/irq.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/arm26/kernel/irq.c 2003-11-26 10:09:04.000000000 +0000 @@ -135,10 +135,10 @@ int show_interrupts(struct seq_file *p, void *v) { - int i; + int i = *(int *) v; struct irqaction * action; - for (i = 0 ; i < NR_IRQS ; i++) { + if (i < NR_IRQS) { action = irq_desc[i].action; if (!action) continue; @@ -148,10 +148,10 @@ seq_printf(p, ", %s", action->name); } seq_putc(p, '\n'); + } else if (i == NR_IRQS) { + show_fiq_list(p, v); + seq_printf(p, "Err: %10lu\n", irq_err_count); } - - show_fiq_list(p, v); - seq_printf(p, "Err: %10lu\n", irq_err_count); return 0; } --- diff/arch/cris/kernel/irq.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/cris/kernel/irq.c 2003-11-26 10:09:04.000000000 +0000 @@ -89,11 +89,11 @@ int show_interrupts(struct seq_file *p, void *v) { - int i; + int i = *(int *) v; struct irqaction * action; unsigned long flags; - for (i = 0; i < NR_IRQS; i++) { + if (i < NR_IRQS) { local_irq_save(flags); action = irq_action[i]; if (!action) --- diff/arch/h8300/Kconfig 2003-10-27 09:20:36.000000000 +0000 +++ source/arch/h8300/Kconfig 2003-11-26 10:09:04.000000000 +0000 @@ -5,6 +5,10 @@ mainmenu "uClinux/h8300 (w/o MMU) Kernel Configuration" +config H8300 + bool + default y + config MMU bool default n --- diff/arch/h8300/Makefile 2003-08-26 10:00:51.000000000 +0100 +++ source/arch/h8300/Makefile 2003-11-26 10:09:04.000000000 +0000 @@ -34,7 +34,7 @@ ldflags-$(CONFIG_CPU_H8S) := -mh8300self CFLAGS += $(cflags-y) -CFLAGS += -mint32 -fno-builtin -Os +CFLAGS += -mint32 -fno-builtin CFLAGS += -g CFLAGS += -D__linux__ CFLAGS += -DUTS_SYSNAME=\"uClinux\" --- diff/arch/h8300/platform/h8300h/ints.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/h8300/platform/h8300h/ints.c 2003-11-26 10:09:04.000000000 +0000 @@ -228,9 +228,9 @@ int show_interrupts(struct seq_file *p, void *v) { - int i; + int i = *(int *) v; - for (i = 0; i < NR_IRQS; i++) { + if (i < NR_IRQS) { if (irq_list[i]) { seq_printf(p, "%3d: %10u ",i,irq_list[i]->count); seq_printf(p, "%s\n", irq_list[i]->devname); --- diff/arch/h8300/platform/h8s/ints.c 2003-10-27 09:20:36.000000000 +0000 +++ source/arch/h8300/platform/h8s/ints.c 2003-11-26 10:09:04.000000000 +0000 @@ -280,9 +280,9 @@ int show_interrupts(struct seq_file *p, void *v) { - int i; + int i = *(int *) v; - for (i = 0; i < NR_IRQS; i++) { + if (i < NR_IRQS) { if (irq_list[i]) { seq_printf(p, "%3d: %10u ",i,irq_list[i]->count); seq_printf(p, "%s\n", irq_list[i]->devname); --- diff/arch/i386/Kconfig 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/Kconfig 2003-11-26 10:09:04.000000000 +0000 @@ -397,6 +397,54 @@ depends on MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 default y +config X86_4G + bool "4 GB kernel-space and 4 GB user-space virtual memory support" + help + This option is only useful for systems that have more than 1 GB + of RAM. + + The default kernel VM layout leaves 1 GB of virtual memory for + kernel-space mappings, and 3 GB of VM for user-space applications. + This option ups both the kernel-space VM and the user-space VM to + 4 GB. + + The cost of this option is additional TLB flushes done at + system-entry points that transition from user-mode into kernel-mode. + I.e. system calls and page faults, and IRQs that interrupt user-mode + code. There's also additional overhead to kernel operations that copy + memory to/from user-space. The overhead from this is hard to tell and + depends on the workload - it can be anything from no visible overhead + to 20-30% overhead. A good rule of thumb is to count with a runtime + overhead of 20%. + + The upside is the much increased kernel-space VM, which more than + quadruples the maximum amount of RAM supported. Kernels compiled with + this option boot on 64GB of RAM and still have more than 3.1 GB of + 'lowmem' left. Another bonus is that highmem IO bouncing decreases, + if used with drivers that still use bounce-buffers. + + There's also a 33% increase in user-space VM size - database + applications might see a boost from this. + + But the cost of the TLB flushes and the runtime overhead has to be + weighed against the bonuses offered by the larger VM spaces. The + dividing line depends on the actual workload - there might be 4 GB + systems that benefit from this option. Systems with less than 4 GB + of RAM will rarely see a benefit from this option - but it's not + out of question, the exact circumstances have to be considered. + +config X86_SWITCH_PAGETABLES + def_bool X86_4G + +config X86_4G_VM_LAYOUT + def_bool X86_4G + +config X86_UACCESS_INDIRECT + def_bool X86_4G + +config X86_HIGH_ENTRY + def_bool X86_4G + config HPET_TIMER bool "HPET Timer Support" help @@ -784,6 +832,25 @@ See for more information. +config EFI + bool "Boot from EFI support (EXPERIMENTAL)" + depends on ACPI + default n + ---help--- + + This enables the the kernel to boot on EFI platforms using + system configuration information passed to it from the firmware. + This also enables the kernel to use any EFI runtime services that are + available (such as the EFI variable services). + + This option is only useful on systems that have EFI firmware + and will result in a kernel image that is ~8k larger. In addition, + you must use the latest ELILO loader available at + ftp.hpl.hp.com/pub/linux-ia64/ in order to take advantage of kernel + initialization using EFI information (neither GRUB nor LILO know + anything about EFI). However, even with this option, the resultant + kernel should continue to boot on existing non-EFI platforms. + config HAVE_DEC_LOCK bool depends on (SMP || PREEMPT) && X86_CMPXCHG @@ -793,7 +860,7 @@ # Summit needs it only when NUMA is on config BOOT_IOREMAP bool - depends on ((X86_SUMMIT || X86_GENERICARCH) && NUMA) + depends on (((X86_SUMMIT || X86_GENERICARCH) && NUMA) || (X86 && EFI)) default y endmenu @@ -1030,6 +1097,25 @@ depends on PCI && ((PCI_GODIRECT || PCI_GOANY) || X86_VISWS) default y +config PCI_USE_VECTOR + bool "Vector-based interrupt indexing" + depends on X86_LOCAL_APIC + default n + help + This replaces the current existing IRQ-based index interrupt scheme + with the vector-base index scheme. The advantages of vector base + over IRQ base are listed below: + 1) Support MSI implementation. + 2) Support future IOxAPIC hotplug + + Note that this enables MSI, Message Signaled Interrupt, on all + MSI capable device functions detected if users also install the + MSI patch. Message Signal Interrupt enables an MSI-capable + hardware device to send an inbound Memory Write on its PCI bus + instead of asserting IRQ signal on device IRQ pin. + + If you don't know what to do here, say N. + source "drivers/pci/Kconfig" config ISA @@ -1187,6 +1273,15 @@ This results in a large slowdown, but helps to find certain types of memory corruptions. +config SPINLINE + bool "Spinlock inlining" + depends on DEBUG_KERNEL + help + This will change spinlocks from out of line to inline, making them + account cost to the callers in readprofile, rather than the lock + itself (as ".text.lock.filename"). This can be helpful for finding + the callers of locks. + config DEBUG_HIGHMEM bool "Highmem debugging" depends on DEBUG_KERNEL && HIGHMEM @@ -1203,20 +1298,208 @@ Say Y here only if you plan to use gdb to debug the kernel. If you don't debug the kernel, you can say N. +config LOCKMETER + bool "Kernel lock metering" + depends on SMP + help + Say Y to enable kernel lock metering, which adds overhead to SMP locks, + but allows you to see various statistics using the lockstat command. + config DEBUG_SPINLOCK_SLEEP bool "Sleep-inside-spinlock checking" help If you say Y here, various routines which may sleep will become very noisy if they are called with a spinlock held. +config KGDB + bool "Include kgdb kernel debugger" + depends on DEBUG_KERNEL + help + If you say Y here, the system will be compiled with the debug + option (-g) and a debugging stub will be included in the + kernel. This stub communicates with gdb on another (host) + computer via a serial port. The host computer should have + access to the kernel binary file (vmlinux) and a serial port + that is connected to the target machine. Gdb can be made to + configure the serial port or you can use stty and setserial to + do this. See the 'target' command in gdb. This option also + configures in the ability to request a breakpoint early in the + boot process. To request the breakpoint just include 'kgdb' + as a boot option when booting the target machine. The system + will then break as soon as it looks at the boot options. This + option also installs a breakpoint in panic and sends any + kernel faults to the debugger. For more information see the + Documentation/i386/kgdb.txt file. + +choice + depends on KGDB + prompt "Debug serial port BAUD" + default KGDB_115200BAUD + help + Gdb and the kernel stub need to agree on the baud rate to be + used. Some systems (x86 family at this writing) allow this to + be configured. + +config KGDB_9600BAUD + bool "9600" + +config KGDB_19200BAUD + bool "19200" + +config KGDB_38400BAUD + bool "38400" + +config KGDB_57600BAUD + bool "57600" + +config KGDB_115200BAUD + bool "115200" +endchoice + +config KGDB_PORT + hex "hex I/O port address of the debug serial port" + depends on KGDB + default 3f8 + help + Some systems (x86 family at this writing) allow the port + address to be configured. The number entered is assumed to be + hex, don't put 0x in front of it. The standard address are: + COM1 3f8 , irq 4 and COM2 2f8 irq 3. Setserial /dev/ttySx + will tell you what you have. It is good to test the serial + connection with a live system before trying to debug. + +config KGDB_IRQ + int "IRQ of the debug serial port" + depends on KGDB + default 4 + help + This is the irq for the debug port. If everything is working + correctly and the kernel has interrupts on a control C to the + port should cause a break into the kernel debug stub. + +config DEBUG_INFO + bool + depends on KGDB + default y + +config KGDB_MORE + bool "Add any additional compile options" + depends on KGDB + default n + help + Saying yes here turns on the ability to enter additional + compile options. + + +config KGDB_OPTIONS + depends on KGDB_MORE + string "Additional compile arguments" + default "-O1" + help + This option allows you enter additional compile options for + the whole kernel compile. Each platform will have a default + that seems right for it. For example on PPC "-ggdb -O1", and + for i386 "-O1". Note that by configuring KGDB "-g" is already + turned on. In addition, on i386 platforms + "-fomit-frame-pointer" is deleted from the standard compile + options. + +config NO_KGDB_CPUS + int "Number of CPUs" + depends on KGDB && SMP + default NR_CPUS + help + + This option sets the number of cpus for kgdb ONLY. It is used + to prune some internal structures so they look "nice" when + displayed with gdb. This is to overcome possibly larger + numbers that may have been entered above. Enter the real + number to get nice clean kgdb_info displays. + +config KGDB_TS + bool "Enable kgdb time stamp macros?" + depends on KGDB + default n + help + Kgdb event macros allow you to instrument your code with calls + to the kgdb event recording function. The event log may be + examined with gdb at a break point. Turning on this + capability also allows you to choose how many events to + keep. Kgdb always keeps the lastest events. + +choice + depends on KGDB_TS + prompt "Max number of time stamps to save?" + default KGDB_TS_128 + +config KGDB_TS_64 + bool "64" + +config KGDB_TS_128 + bool "128" + +config KGDB_TS_256 + bool "256" + +config KGDB_TS_512 + bool "512" + +config KGDB_TS_1024 + bool "1024" + +endchoice + +config STACK_OVERFLOW_TEST + bool "Turn on kernel stack overflow testing?" + depends on KGDB + default n + help + This option enables code in the front line interrupt handlers + to check for kernel stack overflow on interrupts and system + calls. This is part of the kgdb code on x86 systems. + +config KGDB_CONSOLE + bool "Enable serial console thru kgdb port" + depends on KGDB + default n + help + This option enables the command line "console=kgdb" option. + When the system is booted with this option in the command line + all kernel printk output is sent to gdb (as well as to other + consoles). For this to work gdb must be connected. For this + reason, this command line option will generate a breakpoint if + gdb has not yet connected. After the gdb continue command is + given all pent up console output will be printed by gdb on the + host machine. Neither this option, nor KGDB require the + serial driver to be configured. + +config KGDB_SYSRQ + bool "Turn on SysRq 'G' command to do a break?" + depends on KGDB + default y + help + This option includes an option in the SysRq code that allows + you to enter SysRq G which generates a breakpoint to the KGDB + stub. This will work if the keyboard is alive and can + interrupt the system. Because of constraints on when the + serial port interrupt can be enabled, this code may allow you + to interrupt the system before the serial port control C is + available. Just say yes here. + config FRAME_POINTER bool "Compile the kernel with frame pointers" + default KGDB help If you say Y here the resulting kernel image will be slightly larger and slower, but it will give very useful debugging information. If you don't debug the kernel, you can say N, but we may not be able to solve problems without frame pointers. +config MAGIC_SYSRQ + bool + depends on KGDB_SYSRQ + default y + config X86_EXTRA_IRQS bool depends on X86_LOCAL_APIC || X86_VOYAGER --- diff/arch/i386/Makefile 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/Makefile 2003-11-26 10:09:04.000000000 +0000 @@ -84,6 +84,9 @@ # default subarch .h files mflags-y += -Iinclude/asm-i386/mach-default +mflags-$(CONFIG_KGDB) += -gdwarf-2 +mflags-$(CONFIG_KGDB_MORE) += $(shell echo $(CONFIG_KGDB_OPTIONS) | sed -e 's/"//g') + head-y := arch/i386/kernel/head.o arch/i386/kernel/init_task.o libs-y += arch/i386/lib/ --- diff/arch/i386/boot/setup.S 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/boot/setup.S 2003-11-26 10:09:04.000000000 +0000 @@ -162,7 +162,7 @@ # can be located anywhere in # low memory 0x10000 or higher. -ramdisk_max: .long MAXMEM-1 # (Header version 0x0203 or later) +ramdisk_max: .long __MAXMEM-1 # (Header version 0x0203 or later) # The highest safe address for # the contents of an initrd --- diff/arch/i386/kernel/Makefile 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/kernel/Makefile 2003-11-26 10:09:04.000000000 +0000 @@ -7,13 +7,14 @@ obj-y := process.o semaphore.o signal.o entry.o traps.o irq.o vm86.o \ ptrace.o i8259.o ioport.o ldt.o setup.o time.o sys_i386.o \ pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \ - doublefault.o + doublefault.o entry_trampoline.o obj-y += cpu/ obj-y += timers/ obj-$(CONFIG_ACPI_BOOT) += acpi/ obj-$(CONFIG_X86_BIOS_REBOOT) += reboot.o obj-$(CONFIG_MCA) += mca.o +obj-$(CONFIG_KGDB) += kgdb_stub.o obj-$(CONFIG_X86_MSR) += msr.o obj-$(CONFIG_X86_CPUID) += cpuid.o obj-$(CONFIG_MICROCODE) += microcode.o @@ -30,6 +31,7 @@ obj-y += sysenter.o vsyscall.o obj-$(CONFIG_ACPI_SRAT) += srat.o obj-$(CONFIG_HPET_TIMER) += time_hpet.o +obj-$(CONFIG_EFI) += efi.o efi_stub.o EXTRA_AFLAGS := -traditional --- diff/arch/i386/kernel/acpi/boot.c 2003-11-25 15:24:57.000000000 +0000 +++ source/arch/i386/kernel/acpi/boot.c 2003-11-26 10:09:04.000000000 +0000 @@ -26,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -249,29 +250,66 @@ #ifdef CONFIG_ACPI_BUS /* - * Set specified PIC IRQ to level triggered mode. + * "acpi_pic_sci=level" (current default) + * programs the PIC-mode SCI to Level Trigger. + * (NO-OP if the BIOS set Level Trigger already) + * + * If a PIC-mode SCI is not recogznied or gives spurious IRQ7's + * it may require Edge Trigger -- use "acpi_pic_sci=edge" + * (NO-OP if the BIOS set Edge Trigger already) * * Port 0x4d0-4d1 are ECLR1 and ECLR2, the Edge/Level Control Registers * for the 8259 PIC. bit[n] = 1 means irq[n] is Level, otherwise Edge. * ECLR1 is IRQ's 0-7 (IRQ 0, 1, 2 must be 0) * ECLR2 is IRQ's 8-15 (IRQ 8, 13 must be 0) - * - * As the BIOS should have done this for us, - * print a warning if the IRQ wasn't already set to level. */ -void acpi_pic_set_level_irq(unsigned int irq) +static int __initdata acpi_pic_sci_trigger; /* 0: level, 1: edge */ + +void __init +acpi_pic_sci_set_trigger(unsigned int irq) { unsigned char mask = 1 << (irq & 7); unsigned int port = 0x4d0 + (irq >> 3); unsigned char val = inb(port); + + printk(PREFIX "IRQ%d SCI:", irq); if (!(val & mask)) { - printk(KERN_WARNING PREFIX "IRQ %d was Edge Triggered, " - "setting to Level Triggerd\n", irq); - outb(val | mask, port); + printk(" Edge"); + + if (!acpi_pic_sci_trigger) { + printk(" set to Level"); + outb(val | mask, port); + } + } else { + printk(" Level"); + + if (acpi_pic_sci_trigger) { + printk(" set to Edge"); + outb(val | mask, port); + } + } + printk(" Trigger.\n"); +} + +int __init +acpi_pic_sci_setup(char *str) +{ + while (str && *str) { + if (strncmp(str, "level", 5) == 0) + acpi_pic_sci_trigger = 0; /* force level trigger */ + if (strncmp(str, "edge", 4) == 0) + acpi_pic_sci_trigger = 1; /* force edge trigger */ + str = strchr(str, ','); + if (str) + str += strspn(str, ", \t"); } + return 1; } + +__setup("acpi_pic_sci=", acpi_pic_sci_setup); + #endif /* CONFIG_ACPI_BUS */ @@ -326,11 +364,48 @@ } #endif +/* detect the location of the ACPI PM Timer */ +#ifdef CONFIG_X86_PM_TIMER +extern u32 pmtmr_ioport; + +static int __init acpi_parse_fadt(unsigned long phys, unsigned long size) +{ + struct fadt_descriptor_rev2 *fadt =0; + + fadt = (struct fadt_descriptor_rev2*) __acpi_map_table(phys,size); + if(!fadt) { + printk(KERN_WARNING PREFIX "Unable to map FADT\n"); + return 0; + } + + if (fadt->revision >= FADT2_REVISION_ID) { + /* FADT rev. 2 */ + if (fadt->xpm_tmr_blk.address_space_id != ACPI_ADR_SPACE_SYSTEM_IO) + return 0; + + pmtmr_ioport = fadt->xpm_tmr_blk.address; + } else { + /* FADT rev. 1 */ + pmtmr_ioport = fadt->V1_pm_tmr_blk; + } + if (pmtmr_ioport) + printk(KERN_INFO PREFIX "PM-Timer IO Port: %#x\n", pmtmr_ioport); + return 0; +} +#endif + + unsigned long __init acpi_find_rsdp (void) { unsigned long rsdp_phys = 0; + if (efi_enabled) { + if (efi.acpi20) + return __pa(efi.acpi20); + else if (efi.acpi) + return __pa(efi.acpi); + } /* * Scan memory looking for the RSDP signature. First search EBDA (low * memory) paragraphs and then search upper memory (E0000-FFFFF). @@ -519,5 +594,9 @@ acpi_table_parse(ACPI_HPET, acpi_parse_hpet); #endif +#ifdef CONFIG_X86_PM_TIMER + acpi_table_parse(ACPI_FADT, acpi_parse_fadt); +#endif + return 0; } --- diff/arch/i386/kernel/asm-offsets.c 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/kernel/asm-offsets.c 2003-11-26 10:09:04.000000000 +0000 @@ -4,9 +4,11 @@ * to extract and format the required data. */ +#include #include #include #include "sigframe.h" +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -28,4 +30,17 @@ DEFINE(RT_SIGFRAME_sigcontext, offsetof (struct rt_sigframe, uc.uc_mcontext)); + DEFINE(TI_task, offsetof (struct thread_info, task)); + DEFINE(TI_exec_domain, offsetof (struct thread_info, exec_domain)); + DEFINE(TI_flags, offsetof (struct thread_info, flags)); + DEFINE(TI_preempt_count, offsetof (struct thread_info, preempt_count)); + DEFINE(TI_addr_limit, offsetof (struct thread_info, addr_limit)); + DEFINE(TI_real_stack, offsetof (struct thread_info, real_stack)); + DEFINE(TI_virtual_stack, offsetof (struct thread_info, virtual_stack)); + DEFINE(TI_user_pgd, offsetof (struct thread_info, user_pgd)); + + DEFINE(FIX_ENTRY_TRAMPOLINE_0_addr, __fix_to_virt(FIX_ENTRY_TRAMPOLINE_0)); + DEFINE(FIX_VSYSCALL_addr, __fix_to_virt(FIX_VSYSCALL)); + DEFINE(PAGE_SIZE_asm, PAGE_SIZE); + DEFINE(task_thread_db7, offsetof (struct task_struct, thread.debugreg[7])); } --- diff/arch/i386/kernel/cpu/common.c 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/kernel/cpu/common.c 2003-11-26 10:09:04.000000000 +0000 @@ -510,16 +510,20 @@ BUG(); enter_lazy_tlb(&init_mm, current); - load_esp0(t, thread->esp0); + load_esp0(t, thread); set_tss_desc(cpu,t); cpu_gdt_table[cpu][GDT_ENTRY_TSS].b &= 0xfffffdff; load_TR_desc(); - load_LDT(&init_mm.context); + if (cpu) + load_LDT(&init_mm.context); /* Set up doublefault TSS pointer in the GDT */ __set_tss_desc(cpu, GDT_ENTRY_DOUBLEFAULT_TSS, &doublefault_tss); cpu_gdt_table[cpu][GDT_ENTRY_DOUBLEFAULT_TSS].b &= 0xfffffdff; + if (cpu) + trap_init_virtual_GDT(); + /* Clear %fs and %gs. */ asm volatile ("xorl %eax, %eax; movl %eax, %fs; movl %eax, %gs"); --- diff/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 2003-09-17 12:28:01.000000000 +0100 +++ source/arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c 2003-11-26 10:09:04.000000000 +0000 @@ -73,6 +73,16 @@ { .frequency = CPUFREQ_TABLE_END } }; +/* Ultra Low Voltage Intel Pentium M processor 1000MHz */ +static struct cpufreq_frequency_table op_1000[] = + { + OP(600, 844), + OP(800, 972), + OP(900, 988), + OP(1000, 1004), + { .frequency = CPUFREQ_TABLE_END } + }; + /* Low Voltage Intel Pentium M processor 1.10GHz */ static struct cpufreq_frequency_table op_1100[] = { @@ -165,6 +175,7 @@ static const struct cpu_model models[] = { _CPU( 900, " 900"), + CPU(1000), CPU(1100), CPU(1200), CPU(1300), --- diff/arch/i386/kernel/cpu/intel.c 2003-11-25 15:24:57.000000000 +0000 +++ source/arch/i386/kernel/cpu/intel.c 2003-11-26 10:09:04.000000000 +0000 @@ -8,11 +8,10 @@ #include #include #include +#include #include "cpu.h" -extern int trap_init_f00f_bug(void); - #ifdef CONFIG_X86_INTEL_USERCOPY /* * Alignment at which movsl is preferred for bulk memory copies. @@ -157,7 +156,7 @@ c->f00f_bug = 1; if ( !f00f_workaround_enabled ) { - trap_init_f00f_bug(); + trap_init_virtual_IDT(); printk(KERN_NOTICE "Intel Pentium with F0 0F bug - workaround enabled.\n"); f00f_workaround_enabled = 1; } --- diff/arch/i386/kernel/dmi_scan.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/kernel/dmi_scan.c 2003-11-26 10:09:04.000000000 +0000 @@ -16,6 +16,7 @@ int is_sony_vaio_laptop; int is_unsafe_smbus; +int es7000_plat = 0; struct dmi_header { @@ -1011,6 +1012,7 @@ printk(KERN_NOTICE "ACPI disabled because your bios is from %s and too old\n", s); printk(KERN_NOTICE "You can enable it with acpi=force\n"); acpi_disabled = 1; + acpi_ht = 0; } } } --- diff/arch/i386/kernel/doublefault.c 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/kernel/doublefault.c 2003-11-26 10:09:04.000000000 +0000 @@ -7,12 +7,13 @@ #include #include #include +#include #define DOUBLEFAULT_STACKSIZE (1024) static unsigned long doublefault_stack[DOUBLEFAULT_STACKSIZE]; #define STACK_START (unsigned long)(doublefault_stack+DOUBLEFAULT_STACKSIZE) -#define ptr_ok(x) ((x) > 0xc0000000 && (x) < 0xc1000000) +#define ptr_ok(x) (((x) > __PAGE_OFFSET && (x) < (__PAGE_OFFSET + 0x01000000)) || ((x) >= FIXADDR_START)) static void doublefault_fn(void) { @@ -38,8 +39,8 @@ printk("eax = %08lx, ebx = %08lx, ecx = %08lx, edx = %08lx\n", t->eax, t->ebx, t->ecx, t->edx); - printk("esi = %08lx, edi = %08lx\n", - t->esi, t->edi); + printk("esi = %08lx, edi = %08lx, ebp = %08lx\n", + t->esi, t->edi, t->ebp); } } --- diff/arch/i386/kernel/entry.S 2003-11-25 15:24:57.000000000 +0000 +++ source/arch/i386/kernel/entry.S 2003-11-26 10:09:04.000000000 +0000 @@ -43,11 +43,25 @@ #include #include #include +#include #include #include +#include #include #include #include "irq_vectors.h" + /* We do not recover from a stack overflow, but at least + * we know it happened and should be able to track it down. + */ +#ifdef CONFIG_STACK_OVERFLOW_TEST +#define STACK_OVERFLOW_TEST \ + testl $7680,%esp; \ + jnz 10f; \ + call stack_overflow; \ +10: +#else +#define STACK_OVERFLOW_TEST +#endif #define nr_syscalls ((syscall_table_size)/4) @@ -87,7 +101,102 @@ #define resume_kernel restore_all #endif -#define SAVE_ALL \ +#ifdef CONFIG_X86_HIGH_ENTRY + +#ifdef CONFIG_X86_SWITCH_PAGETABLES + +#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) +/* + * If task is preempted in __SWITCH_KERNELSPACE, and moved to another cpu, + * __switch_to repoints %esp to the appropriate virtual stack; but %ebp is + * left stale, so we must check whether to repeat the real stack calculation. + */ +#define repeat_if_esp_changed \ + xorl %esp, %ebp; \ + testl $0xffffe000, %ebp; \ + jnz 0b +#else +#define repeat_if_esp_changed +#endif + +/* clobbers ebx, edx and ebp */ + +#define __SWITCH_KERNELSPACE \ + cmpl $0xff000000, %esp; \ + jb 1f; \ + \ + /* \ + * switch pagetables and load the real stack, \ + * keep the stack offset: \ + */ \ + \ + movl $swapper_pg_dir-__PAGE_OFFSET, %edx; \ + \ + /* GET_THREAD_INFO(%ebp) intermixed */ \ +0: \ + movl %esp, %ebp; \ + movl %esp, %ebx; \ + andl $0xffffe000, %ebp; \ + andl $0x00001fff, %ebx; \ + orl TI_real_stack(%ebp), %ebx; \ + repeat_if_esp_changed; \ + \ + movl %edx, %cr3; \ + movl %ebx, %esp; \ +1: + +#endif + + +#define __SWITCH_USERSPACE \ + /* interrupted any of the user return paths? */ \ + \ + movl EIP(%esp), %eax; \ + \ + cmpl $int80_ret_start_marker, %eax; \ + jb 33f; /* nope - continue with sysexit check */\ + cmpl $int80_ret_end_marker, %eax; \ + jb 22f; /* yes - switch to virtual stack */ \ +33: \ + cmpl $sysexit_ret_start_marker, %eax; \ + jb 44f; /* nope - continue with user check */ \ + cmpl $sysexit_ret_end_marker, %eax; \ + jb 22f; /* yes - switch to virtual stack */ \ + /* return to userspace? */ \ +44: \ + movl EFLAGS(%esp),%ecx; \ + movb CS(%esp),%cl; \ + testl $(VM_MASK | 3),%ecx; \ + jz 2f; \ +22: \ + /* \ + * switch to the virtual stack, then switch to \ + * the userspace pagetables. \ + */ \ + \ + GET_THREAD_INFO(%ebp); \ + movl TI_virtual_stack(%ebp), %edx; \ + movl TI_user_pgd(%ebp), %ecx; \ + \ + movl %esp, %ebx; \ + andl $0x1fff, %ebx; \ + orl %ebx, %edx; \ +int80_ret_start_marker: \ + movl %edx, %esp; \ + movl %ecx, %cr3; \ + \ + __RESTORE_ALL; \ +int80_ret_end_marker: \ +2: + +#else /* !CONFIG_X86_HIGH_ENTRY */ + +#define __SWITCH_KERNELSPACE +#define __SWITCH_USERSPACE + +#endif + +#define __SAVE_ALL \ cld; \ pushl %es; \ pushl %ds; \ @@ -102,7 +211,7 @@ movl %edx, %ds; \ movl %edx, %es; -#define RESTORE_INT_REGS \ +#define __RESTORE_INT_REGS \ popl %ebx; \ popl %ecx; \ popl %edx; \ @@ -111,29 +220,28 @@ popl %ebp; \ popl %eax -#define RESTORE_REGS \ - RESTORE_INT_REGS; \ -1: popl %ds; \ -2: popl %es; \ +#define __RESTORE_REGS \ + __RESTORE_INT_REGS; \ +111: popl %ds; \ +222: popl %es; \ .section .fixup,"ax"; \ -3: movl $0,(%esp); \ - jmp 1b; \ -4: movl $0,(%esp); \ - jmp 2b; \ +444: movl $0,(%esp); \ + jmp 111b; \ +555: movl $0,(%esp); \ + jmp 222b; \ .previous; \ .section __ex_table,"a";\ .align 4; \ - .long 1b,3b; \ - .long 2b,4b; \ + .long 111b,444b;\ + .long 222b,555b;\ .previous - -#define RESTORE_ALL \ - RESTORE_REGS \ +#define __RESTORE_ALL \ + __RESTORE_REGS \ addl $4, %esp; \ -1: iret; \ +333: iret; \ .section .fixup,"ax"; \ -2: sti; \ +666: sti; \ movl $(__USER_DS), %edx; \ movl %edx, %ds; \ movl %edx, %es; \ @@ -142,10 +250,19 @@ .previous; \ .section __ex_table,"a";\ .align 4; \ - .long 1b,2b; \ + .long 333b,666b;\ .previous +#define SAVE_ALL \ + __SAVE_ALL; \ + __SWITCH_KERNELSPACE; \ + STACK_OVERFLOW_TEST; + +#define RESTORE_ALL \ + __SWITCH_USERSPACE; \ + __RESTORE_ALL; +.section .entry.text,"ax" ENTRY(lcall7) pushfl # We get a different stack layout with call @@ -163,7 +280,7 @@ movl %edx,EIP(%ebp) # Now we move them to their "normal" places movl %ecx,CS(%ebp) # andl $-8192, %ebp # GET_THREAD_INFO - movl TI_EXEC_DOMAIN(%ebp), %edx # Get the execution domain + movl TI_exec_domain(%ebp), %edx # Get the execution domain call *4(%edx) # Call the lcall7 handler for the domain addl $4, %esp popl %eax @@ -208,7 +325,7 @@ cli # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret - movl TI_FLAGS(%ebp), %ecx + movl TI_flags(%ebp), %ecx andl $_TIF_WORK_MASK, %ecx # is there any work to be done on # int/exception return? jne work_pending @@ -216,18 +333,18 @@ #ifdef CONFIG_PREEMPT ENTRY(resume_kernel) - cmpl $0,TI_PRE_COUNT(%ebp) # non-zero preempt_count ? + cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ? jnz restore_all need_resched: - movl TI_FLAGS(%ebp), %ecx # need_resched set ? + movl TI_flags(%ebp), %ecx # need_resched set ? testb $_TIF_NEED_RESCHED, %cl jz restore_all testl $IF_MASK,EFLAGS(%esp) # interrupts off (exception path) ? jz restore_all - movl $PREEMPT_ACTIVE,TI_PRE_COUNT(%ebp) + movl $PREEMPT_ACTIVE,TI_preempt_count(%ebp) sti call schedule - movl $0,TI_PRE_COUNT(%ebp) + movl $0,TI_preempt_count(%ebp) cli jmp need_resched #endif @@ -246,37 +363,50 @@ pushl $(__USER_CS) pushl $SYSENTER_RETURN -/* - * Load the potential sixth argument from user stack. - * Careful about security. - */ - cmpl $__PAGE_OFFSET-3,%ebp - jae syscall_fault -1: movl (%ebp),%ebp -.section __ex_table,"a" - .align 4 - .long 1b,syscall_fault -.previous - pushl %eax SAVE_ALL GET_THREAD_INFO(%ebp) cmpl $(nr_syscalls), %eax jae syscall_badsys - testb $_TIF_SYSCALL_TRACE,TI_FLAGS(%ebp) + testb $_TIF_SYSCALL_TRACE,TI_flags(%ebp) jnz syscall_trace_entry call *sys_call_table(,%eax,4) movl %eax,EAX(%esp) cli - movl TI_FLAGS(%ebp), %ecx + movl TI_flags(%ebp), %ecx testw $_TIF_ALLWORK_MASK, %cx jne syscall_exit_work + +#ifdef CONFIG_X86_SWITCH_PAGETABLES + + GET_THREAD_INFO(%ebp) + movl TI_virtual_stack(%ebp), %edx + movl TI_user_pgd(%ebp), %ecx + movl %esp, %ebx + andl $0x1fff, %ebx + orl %ebx, %edx +sysexit_ret_start_marker: + movl %edx, %esp + movl %ecx, %cr3 +#endif + /* + * only ebx is not restored by the userspace sysenter vsyscall + * code, it assumes it to be callee-saved. + */ + movl EBX(%esp), %ebx + /* if something modifies registers it must also disable sysexit */ + movl EIP(%esp), %edx movl OLDESP(%esp), %ecx + sti sysexit +#ifdef CONFIG_X86_SWITCH_PAGETABLES +sysexit_ret_end_marker: + nop +#endif # system call handler stub @@ -287,7 +417,7 @@ cmpl $(nr_syscalls), %eax jae syscall_badsys # system call tracing in operation - testb $_TIF_SYSCALL_TRACE,TI_FLAGS(%ebp) + testb $_TIF_SYSCALL_TRACE,TI_flags(%ebp) jnz syscall_trace_entry syscall_call: call *sys_call_table(,%eax,4) @@ -296,10 +426,23 @@ cli # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret - movl TI_FLAGS(%ebp), %ecx + movl TI_flags(%ebp), %ecx testw $_TIF_ALLWORK_MASK, %cx # current->work jne syscall_exit_work restore_all: +#ifdef CONFIG_TRAP_BAD_SYSCALL_EXITS + movl EFLAGS(%esp), %eax # mix EFLAGS and CS + movb CS(%esp), %al + testl $(VM_MASK | 3), %eax + jz resume_kernelX # returning to kernel or vm86-space + + cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ? + jz resume_kernelX + + int $3 + +resume_kernelX: +#endif RESTORE_ALL # perform work that needs to be done immediately before resumption @@ -312,7 +455,7 @@ cli # make sure we don't miss an interrupt # setting need_resched or sigpending # between sampling and the iret - movl TI_FLAGS(%ebp), %ecx + movl TI_flags(%ebp), %ecx andl $_TIF_WORK_MASK, %ecx # is there any work to be done other # than syscall tracing? jz restore_all @@ -327,6 +470,22 @@ # vm86-space xorl %edx, %edx call do_notify_resume + +#if CONFIG_X86_HIGH_ENTRY + /* + * Reload db7 if necessary: + */ + movl TI_flags(%ebp), %ecx + testb $_TIF_DB7, %cl + jnz work_db7 + + jmp restore_all + +work_db7: + movl TI_task(%ebp), %edx; + movl task_thread_db7(%edx), %edx; + movl %edx, %db7; +#endif jmp restore_all ALIGN @@ -382,7 +541,7 @@ */ .data ENTRY(interrupt) -.text +.previous vector=0 ENTRY(irq_entries_start) @@ -392,7 +551,7 @@ jmp common_interrupt .data .long 1b -.text +.previous vector=vector+1 .endr @@ -433,12 +592,17 @@ movl ES(%esp), %edi # get the function address movl %eax, ORIG_EAX(%esp) movl %ecx, ES(%esp) - movl %esp, %edx pushl %esi # push the error code - pushl %edx # push the pt_regs pointer movl $(__USER_DS), %edx movl %edx, %ds movl %edx, %es + +/* clobbers edx, ebx and ebp */ + __SWITCH_KERNELSPACE + + leal 4(%esp), %edx # prepare pt_regs + pushl %edx # push pt_regs + call *%edi addl $8, %esp jmp ret_from_exception @@ -529,7 +693,7 @@ pushl %edx call do_nmi addl $8, %esp - RESTORE_ALL + jmp restore_all nmi_stack_fixup: FIX_STACK(12,nmi_stack_correct, 1) @@ -606,6 +770,8 @@ pushl $do_spurious_interrupt_bug jmp error_code +.previous + .data ENTRY(sys_call_table) .long sys_restart_syscall /* 0 - old "setup()" system call, used for restarting */ --- diff/arch/i386/kernel/head.S 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/kernel/head.S 2003-11-26 10:09:04.000000000 +0000 @@ -16,6 +16,7 @@ #include #include #include +#include #define OLD_CL_MAGIC_ADDR 0x90020 #define OLD_CL_MAGIC 0xA33F @@ -330,7 +331,7 @@ /* This is the default interrupt "handler" :-) */ int_msg: - .asciz "Unknown interrupt\n" + .asciz "Unknown interrupt or fault at EIP %p %p %p\n" ALIGN ignore_int: cld @@ -342,9 +343,17 @@ movl $(__KERNEL_DS),%eax movl %eax,%ds movl %eax,%es + pushl 16(%esp) + pushl 24(%esp) + pushl 32(%esp) + pushl 40(%esp) pushl $int_msg call printk popl %eax + popl %eax + popl %eax + popl %eax + popl %eax popl %ds popl %es popl %edx @@ -377,23 +386,27 @@ .fill NR_CPUS-1,8,0 # space for the other GDT descriptors /* - * This is initialized to create an identity-mapping at 0-8M (for bootup - * purposes) and another mapping of the 0-8M area at virtual address + * This is initialized to create an identity-mapping at 0-16M (for bootup + * purposes) and another mapping of the 0-16M area at virtual address * PAGE_OFFSET. */ .org 0x1000 ENTRY(swapper_pg_dir) .long 0x00102007 .long 0x00103007 - .fill BOOT_USER_PGD_PTRS-2,4,0 - /* default: 766 entries */ + .long 0x00104007 + .long 0x00105007 + .fill BOOT_USER_PGD_PTRS-4,4,0 + /* default: 764 entries */ .long 0x00102007 .long 0x00103007 - /* default: 254 entries */ - .fill BOOT_KERNEL_PGD_PTRS-2,4,0 + .long 0x00104007 + .long 0x00105007 + /* default: 252 entries */ + .fill BOOT_KERNEL_PGD_PTRS-4,4,0 /* - * The page tables are initialized to only 8MB here - the final page + * The page tables are initialized to only 16MB here - the final page * tables are set up later depending on memory size. */ .org 0x2000 @@ -402,15 +415,21 @@ .org 0x3000 ENTRY(pg1) +.org 0x4000 +ENTRY(pg2) + +.org 0x5000 +ENTRY(pg3) + /* * empty_zero_page must immediately follow the page tables ! (The * initialization loop counts until empty_zero_page) */ -.org 0x4000 +.org 0x6000 ENTRY(empty_zero_page) -.org 0x5000 +.org 0x7000 /* * Real beginning of normal "text" segment @@ -419,12 +438,12 @@ ENTRY(_stext) /* - * This starts the data section. Note that the above is all - * in the text section because it has alignment requirements - * that we cannot fulfill any other way. + * This starts the data section. */ .data +.align PAGE_SIZE_asm + /* * The Global Descriptor Table contains 28 quadwords, per-CPU. */ @@ -439,7 +458,9 @@ .quad 0x00cf9a000000ffff /* kernel 4GB code at 0x00000000 */ .quad 0x00cf92000000ffff /* kernel 4GB data at 0x00000000 */ #endif - .align L1_CACHE_BYTES + +.align PAGE_SIZE_asm + ENTRY(cpu_gdt_table) .quad 0x0000000000000000 /* NULL descriptor */ .quad 0x0000000000000000 /* 0x0b reserved */ --- diff/arch/i386/kernel/i386_ksyms.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/kernel/i386_ksyms.c 2003-11-26 10:09:04.000000000 +0000 @@ -98,7 +98,6 @@ EXPORT_SYMBOL_NOVERS(__down_failed_trylock); EXPORT_SYMBOL_NOVERS(__up_wakeup); /* Networking helper routines. */ -EXPORT_SYMBOL(csum_partial_copy_generic); /* Delay loops */ EXPORT_SYMBOL(__ndelay); EXPORT_SYMBOL(__udelay); @@ -112,13 +111,17 @@ EXPORT_SYMBOL(strpbrk); EXPORT_SYMBOL(strstr); +#if !defined(CONFIG_X86_UACCESS_INDIRECT) EXPORT_SYMBOL(strncpy_from_user); -EXPORT_SYMBOL(__strncpy_from_user); +EXPORT_SYMBOL(__direct_strncpy_from_user); EXPORT_SYMBOL(clear_user); EXPORT_SYMBOL(__clear_user); EXPORT_SYMBOL(__copy_from_user_ll); EXPORT_SYMBOL(__copy_to_user_ll); EXPORT_SYMBOL(strnlen_user); +#else /* CONFIG_X86_UACCESS_INDIRECT */ +EXPORT_SYMBOL(direct_csum_partial_copy_generic); +#endif EXPORT_SYMBOL(dma_alloc_coherent); EXPORT_SYMBOL(dma_free_coherent); --- diff/arch/i386/kernel/i387.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/kernel/i387.c 2003-11-26 10:09:04.000000000 +0000 @@ -218,6 +218,7 @@ static int convert_fxsr_to_user( struct _fpstate __user *buf, struct i387_fxsave_struct *fxsave ) { + struct _fpreg tmp[8]; /* 80 bytes scratch area */ unsigned long env[7]; struct _fpreg __user *to; struct _fpxreg *from; @@ -234,23 +235,25 @@ if ( __copy_to_user( buf, env, 7 * sizeof(unsigned long) ) ) return 1; - to = &buf->_st[0]; + to = tmp; from = (struct _fpxreg *) &fxsave->st_space[0]; for ( i = 0 ; i < 8 ; i++, to++, from++ ) { unsigned long *t = (unsigned long *)to; unsigned long *f = (unsigned long *)from; - if (__put_user(*f, t) || - __put_user(*(f + 1), t + 1) || - __put_user(from->exponent, &to->exponent)) - return 1; + *t = *f; + *(t + 1) = *(f+1); + to->exponent = from->exponent; } + if (copy_to_user(buf->_st, tmp, sizeof(struct _fpreg [8]))) + return 1; return 0; } static int convert_fxsr_from_user( struct i387_fxsave_struct *fxsave, struct _fpstate __user *buf ) { + struct _fpreg tmp[8]; /* 80 bytes scratch area */ unsigned long env[7]; struct _fpxreg *to; struct _fpreg __user *from; @@ -258,6 +261,8 @@ if ( __copy_from_user( env, buf, 7 * sizeof(long) ) ) return 1; + if (copy_from_user(tmp, buf->_st, sizeof(struct _fpreg [8]))) + return 1; fxsave->cwd = (unsigned short)(env[0] & 0xffff); fxsave->swd = (unsigned short)(env[1] & 0xffff); @@ -269,15 +274,14 @@ fxsave->fos = env[6]; to = (struct _fpxreg *) &fxsave->st_space[0]; - from = &buf->_st[0]; + from = tmp; for ( i = 0 ; i < 8 ; i++, to++, from++ ) { unsigned long *t = (unsigned long *)to; unsigned long *f = (unsigned long *)from; - if (__get_user(*t, f) || - __get_user(*(t + 1), f + 1) || - __get_user(to->exponent, &from->exponent)) - return 1; + *t = *f; + *(t + 1) = *(f + 1); + to->exponent = from->exponent; } return 0; } --- diff/arch/i386/kernel/i8259.c 2003-06-30 10:07:18.000000000 +0100 +++ source/arch/i386/kernel/i8259.c 2003-11-26 10:09:04.000000000 +0000 @@ -419,8 +419,10 @@ * us. (some of these will be overridden and become * 'special' SMP interrupts) */ - for (i = 0; i < NR_IRQS; i++) { + for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) { int vector = FIRST_EXTERNAL_VECTOR + i; + if (i >= NR_IRQS) + break; if (vector != SYSCALL_VECTOR) set_intr_gate(vector, interrupt[i]); } --- diff/arch/i386/kernel/init_task.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/kernel/init_task.c 2003-11-26 10:09:04.000000000 +0000 @@ -26,7 +26,7 @@ */ union thread_union init_thread_union __attribute__((__section__(".data.init_task"))) = - { INIT_THREAD_INFO(init_task) }; + { INIT_THREAD_INFO(init_task, init_thread_union) }; /* * Initial task structure. @@ -44,5 +44,5 @@ * section. Since TSS's are completely CPU-local, we want them * on exact cacheline boundaries, to eliminate cacheline ping-pong. */ -struct tss_struct init_tss[NR_CPUS] __cacheline_aligned = { [0 ... NR_CPUS-1] = INIT_TSS }; +struct tss_struct init_tss[NR_CPUS] __attribute__((__section__(".data.tss"))) = { [0 ... NR_CPUS-1] = INIT_TSS }; --- diff/arch/i386/kernel/io_apic.c 2003-10-27 09:20:43.000000000 +0000 +++ source/arch/i386/kernel/io_apic.c 2003-11-26 10:09:04.000000000 +0000 @@ -76,6 +76,14 @@ int apic, pin, next; } irq_2_pin[PIN_MAP_SIZE]; +#ifdef CONFIG_PCI_USE_VECTOR +int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1}; +#define vector_to_irq(vector) \ + (platform_legacy_irq(vector) ? vector : vector_irq[vector]) +#else +#define vector_to_irq(vector) (vector) +#endif + /* * The common case is 1:1 IRQ<->pin mappings. Sometimes there are * shared ISA-space IRQs, so we have to support them. We are super @@ -249,7 +257,7 @@ clear_IO_APIC_pin(apic, pin); } -static void set_ioapic_affinity(unsigned int irq, cpumask_t cpumask) +static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t cpumask) { unsigned long flags; int pin; @@ -288,7 +296,7 @@ extern cpumask_t irq_affinity[NR_IRQS]; -static cpumask_t __cacheline_aligned pending_irq_balance_cpumask[NR_IRQS]; +cpumask_t __cacheline_aligned pending_irq_balance_cpumask[NR_IRQS]; #define IRQBALANCE_CHECK_ARCH -999 static int irqbalance_disabled = IRQBALANCE_CHECK_ARCH; @@ -670,13 +678,11 @@ __setup("noirqbalance", irqbalance_disable); -static void set_ioapic_affinity(unsigned int irq, cpumask_t mask); - static inline void move_irq(int irq) { /* note - we hold the desc->lock */ if (unlikely(!cpus_empty(pending_irq_balance_cpumask[irq]))) { - set_ioapic_affinity(irq, pending_irq_balance_cpumask[irq]); + set_ioapic_affinity_irq(irq, pending_irq_balance_cpumask[irq]); cpus_clear(pending_irq_balance_cpumask[irq]); } } @@ -853,7 +859,7 @@ if (irq_entry == -1) continue; irq = pin_2_irq(irq_entry, ioapic, pin); - set_ioapic_affinity(irq, mask); + set_ioapic_affinity_irq(irq, mask); } } @@ -1141,7 +1147,8 @@ /* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */ u8 irq_vector[NR_IRQ_VECTORS] = { FIRST_DEVICE_VECTOR , 0 }; -static int __init assign_irq_vector(int irq) +#ifndef CONFIG_PCI_USE_VECTOR +int __init assign_irq_vector(int irq) { static int current_vector = FIRST_DEVICE_VECTOR, offset = 0; BUG_ON(irq >= NR_IRQ_VECTORS); @@ -1158,11 +1165,36 @@ } IO_APIC_VECTOR(irq) = current_vector; + return current_vector; } +#endif + +static struct hw_interrupt_type ioapic_level_type; +static struct hw_interrupt_type ioapic_edge_type; -static struct hw_interrupt_type ioapic_level_irq_type; -static struct hw_interrupt_type ioapic_edge_irq_type; +#define IOAPIC_AUTO -1 +#define IOAPIC_EDGE 0 +#define IOAPIC_LEVEL 1 + +static inline void ioapic_register_intr(int irq, int vector, unsigned long trigger) +{ + if (use_pci_vector() && !platform_legacy_irq(irq)) { + if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || + trigger == IOAPIC_LEVEL) + irq_desc[vector].handler = &ioapic_level_type; + else + irq_desc[vector].handler = &ioapic_edge_type; + set_intr_gate(vector, interrupt[vector]); + } else { + if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) || + trigger == IOAPIC_LEVEL) + irq_desc[irq].handler = &ioapic_level_type; + else + irq_desc[irq].handler = &ioapic_edge_type; + set_intr_gate(vector, interrupt[irq]); + } +} void __init setup_IO_APIC_irqs(void) { @@ -1220,13 +1252,7 @@ if (IO_APIC_IRQ(irq)) { vector = assign_irq_vector(irq); entry.vector = vector; - - if (IO_APIC_irq_trigger(irq)) - irq_desc[irq].handler = &ioapic_level_irq_type; - else - irq_desc[irq].handler = &ioapic_edge_irq_type; - - set_intr_gate(vector, interrupt[irq]); + ioapic_register_intr(irq, vector, IOAPIC_AUTO); if (!apic && (irq < 16)) disable_8259A_irq(irq); @@ -1273,7 +1299,7 @@ * The timer IRQ doesn't have to know that behind the * scene we have a 8259A-master in AEOI mode ... */ - irq_desc[0].handler = &ioapic_edge_irq_type; + irq_desc[0].handler = &ioapic_edge_type; /* * Add it to the IO-APIC irq-routing table: @@ -1624,10 +1650,6 @@ unsigned char old_id; unsigned long flags; - if (acpi_ioapic) - /* This gets done during IOAPIC enumeration for ACPI. */ - return; - /* * This is broken; anything with a real cpu count has to * circumvent this idiocy regardless. @@ -1763,9 +1785,6 @@ * that was delayed but this is now handled in the device * independent code. */ -#define enable_edge_ioapic_irq unmask_IO_APIC_irq - -static void disable_edge_ioapic_irq (unsigned int irq) { /* nothing */ } /* * Starting up a edge-triggered IO-APIC interrupt is @@ -1776,7 +1795,6 @@ * This is not complete - we should be able to fake * an edge even if it isn't on the 8259A... */ - static unsigned int startup_edge_ioapic_irq(unsigned int irq) { int was_pending = 0; @@ -1794,8 +1812,6 @@ return was_pending; } -#define shutdown_edge_ioapic_irq disable_edge_ioapic_irq - /* * Once we have recorded IRQ_PENDING already, we can mask the * interrupt for real. This prevents IRQ storms from unhandled @@ -1810,9 +1826,6 @@ ack_APIC_irq(); } -static void end_edge_ioapic_irq (unsigned int i) { /* nothing */ } - - /* * Level triggered interrupts can just be masked, * and shutting down and starting up the interrupt @@ -1834,10 +1847,6 @@ return 0; /* don't check for pending */ } -#define shutdown_level_ioapic_irq mask_IO_APIC_irq -#define enable_level_ioapic_irq unmask_IO_APIC_irq -#define disable_level_ioapic_irq mask_IO_APIC_irq - static void end_level_ioapic_irq (unsigned int irq) { unsigned long v; @@ -1864,6 +1873,7 @@ * The idea is from Manfred Spraul. --macro */ i = IO_APIC_VECTOR(irq); + v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1)); ack_APIC_irq(); @@ -1898,7 +1908,57 @@ } } -static void mask_and_ack_level_ioapic_irq (unsigned int irq) { /* nothing */ } +#ifdef CONFIG_PCI_USE_VECTOR +static unsigned int startup_edge_ioapic_vector(unsigned int vector) +{ + int irq = vector_to_irq(vector); + + return startup_edge_ioapic_irq(irq); +} + +static void ack_edge_ioapic_vector(unsigned int vector) +{ + int irq = vector_to_irq(vector); + + ack_edge_ioapic_irq(irq); +} + +static unsigned int startup_level_ioapic_vector (unsigned int vector) +{ + int irq = vector_to_irq(vector); + + return startup_level_ioapic_irq (irq); +} + +static void end_level_ioapic_vector (unsigned int vector) +{ + int irq = vector_to_irq(vector); + + end_level_ioapic_irq(irq); +} + +static void mask_IO_APIC_vector (unsigned int vector) +{ + int irq = vector_to_irq(vector); + + mask_IO_APIC_irq(irq); +} + +static void unmask_IO_APIC_vector (unsigned int vector) +{ + int irq = vector_to_irq(vector); + + unmask_IO_APIC_irq(irq); +} + +static void set_ioapic_affinity_vector (unsigned int vector, + unsigned long cpu_mask) +{ + int irq = vector_to_irq(vector); + + set_ioapic_affinity_irq(irq, cpu_mask); +} +#endif /* * Level and edge triggered IO-APIC interrupts need different handling, @@ -1908,26 +1968,25 @@ * edge-triggered handler, without risking IRQ storms and other ugly * races. */ - -static struct hw_interrupt_type ioapic_edge_irq_type = { +static struct hw_interrupt_type ioapic_edge_type = { .typename = "IO-APIC-edge", - .startup = startup_edge_ioapic_irq, - .shutdown = shutdown_edge_ioapic_irq, - .enable = enable_edge_ioapic_irq, - .disable = disable_edge_ioapic_irq, - .ack = ack_edge_ioapic_irq, - .end = end_edge_ioapic_irq, + .startup = startup_edge_ioapic, + .shutdown = shutdown_edge_ioapic, + .enable = enable_edge_ioapic, + .disable = disable_edge_ioapic, + .ack = ack_edge_ioapic, + .end = end_edge_ioapic, .set_affinity = set_ioapic_affinity, }; -static struct hw_interrupt_type ioapic_level_irq_type = { +static struct hw_interrupt_type ioapic_level_type = { .typename = "IO-APIC-level", - .startup = startup_level_ioapic_irq, - .shutdown = shutdown_level_ioapic_irq, - .enable = enable_level_ioapic_irq, - .disable = disable_level_ioapic_irq, - .ack = mask_and_ack_level_ioapic_irq, - .end = end_level_ioapic_irq, + .startup = startup_level_ioapic, + .shutdown = shutdown_level_ioapic, + .enable = enable_level_ioapic, + .disable = disable_level_ioapic, + .ack = mask_and_ack_level_ioapic, + .end = end_level_ioapic, .set_affinity = set_ioapic_affinity, }; @@ -1947,7 +2006,13 @@ * 0x80, because int 0x80 is hm, kind of importantish. ;) */ for (irq = 0; irq < NR_IRQS ; irq++) { - if (IO_APIC_IRQ(irq) && !IO_APIC_VECTOR(irq)) { + int tmp = irq; + if (use_pci_vector()) { + if (!platform_legacy_irq(tmp)) + if ((tmp = vector_to_irq(tmp)) == -1) + continue; + } + if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) { /* * Hmm.. We don't have an entry for this, * so default to an old-fashioned 8259 @@ -2217,12 +2282,14 @@ /* * Set up IO-APIC IRQ routing. */ - setup_ioapic_ids_from_mpc(); + if (!acpi_ioapic) + setup_ioapic_ids_from_mpc(); sync_Arb_IDs(); setup_IO_APIC_irqs(); init_IO_APIC_traps(); check_timer(); - print_IO_APIC(); + if (!acpi_ioapic) + print_IO_APIC(); } /* @@ -2379,10 +2446,12 @@ "IRQ %d Mode:%i Active:%i)\n", ioapic, mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq, edge_level, active_high_low); + if (use_pci_vector() && !platform_legacy_irq(irq)) + irq = IO_APIC_VECTOR(irq); if (edge_level) { - irq_desc[irq].handler = &ioapic_level_irq_type; + irq_desc[irq].handler = &ioapic_level_type; } else { - irq_desc[irq].handler = &ioapic_edge_irq_type; + irq_desc[irq].handler = &ioapic_edge_type; } set_intr_gate(entry.vector, interrupt[irq]); --- diff/arch/i386/kernel/irq.c 2003-10-27 09:20:43.000000000 +0000 +++ source/arch/i386/kernel/irq.c 2003-11-26 10:09:04.000000000 +0000 @@ -45,6 +45,7 @@ #include #include #include +#include /* * Linux has a controller-independent x86 interrupt architecture. @@ -138,17 +139,19 @@ int show_interrupts(struct seq_file *p, void *v) { - int i, j; + int i = *(int *) v, j; struct irqaction * action; unsigned long flags; - seq_printf(p, " "); - for (j=0; j + * Copyright (C) 1999, 2003 Ingo Molnar */ #include @@ -18,6 +18,8 @@ #include #include #include +#include +#include #ifdef CONFIG_SMP /* avoids "defined but not used" warnig */ static void flush_ldt(void *null) @@ -29,34 +31,31 @@ static int alloc_ldt(mm_context_t *pc, int mincount, int reload) { - void *oldldt; - void *newldt; - int oldsize; + int oldsize, newsize, i; if (mincount <= pc->size) return 0; + /* + * LDT got larger - reallocate if necessary. + */ oldsize = pc->size; mincount = (mincount+511)&(~511); - if (mincount*LDT_ENTRY_SIZE > PAGE_SIZE) - newldt = vmalloc(mincount*LDT_ENTRY_SIZE); - else - newldt = kmalloc(mincount*LDT_ENTRY_SIZE, GFP_KERNEL); - - if (!newldt) - return -ENOMEM; - - if (oldsize) - memcpy(newldt, pc->ldt, oldsize*LDT_ENTRY_SIZE); - oldldt = pc->ldt; - memset(newldt+oldsize*LDT_ENTRY_SIZE, 0, (mincount-oldsize)*LDT_ENTRY_SIZE); - pc->ldt = newldt; - wmb(); + newsize = mincount*LDT_ENTRY_SIZE; + for (i = 0; i < newsize; i += PAGE_SIZE) { + int nr = i/PAGE_SIZE; + BUG_ON(i >= 64*1024); + if (!pc->ldt_pages[nr]) { + pc->ldt_pages[nr] = alloc_page(GFP_HIGHUSER); + if (!pc->ldt_pages[nr]) + return -ENOMEM; + clear_highpage(pc->ldt_pages[nr]); + } + } pc->size = mincount; - wmb(); - if (reload) { #ifdef CONFIG_SMP cpumask_t mask; + preempt_disable(); load_LDT(pc); mask = cpumask_of_cpu(smp_processor_id()); @@ -67,21 +66,20 @@ load_LDT(pc); #endif } - if (oldsize) { - if (oldsize*LDT_ENTRY_SIZE > PAGE_SIZE) - vfree(oldldt); - else - kfree(oldldt); - } return 0; } static inline int copy_ldt(mm_context_t *new, mm_context_t *old) { - int err = alloc_ldt(new, old->size, 0); - if (err < 0) + int i, err, size = old->size, nr_pages = (size*LDT_ENTRY_SIZE + PAGE_SIZE-1)/PAGE_SIZE; + + err = alloc_ldt(new, size, 0); + if (err < 0) { + new->size = 0; return err; - memcpy(new->ldt, old->ldt, old->size*LDT_ENTRY_SIZE); + } + for (i = 0; i < nr_pages; i++) + copy_user_highpage(new->ldt_pages[i], old->ldt_pages[i], 0); return 0; } @@ -96,6 +94,7 @@ init_MUTEX(&mm->context.sem); mm->context.size = 0; + memset(mm->context.ldt_pages, 0, sizeof(struct page *) * MAX_LDT_PAGES); old_mm = current->mm; if (old_mm && old_mm->context.size > 0) { down(&old_mm->context.sem); @@ -107,23 +106,21 @@ /* * No need to lock the MM as we are the last user + * Do not touch the ldt register, we are already + * in the next thread. */ void destroy_context(struct mm_struct *mm) { - if (mm->context.size) { - if (mm == current->active_mm) - clear_LDT(); - if (mm->context.size*LDT_ENTRY_SIZE > PAGE_SIZE) - vfree(mm->context.ldt); - else - kfree(mm->context.ldt); - mm->context.size = 0; - } + int i, nr_pages = (mm->context.size*LDT_ENTRY_SIZE + PAGE_SIZE-1) / PAGE_SIZE; + + for (i = 0; i < nr_pages; i++) + __free_page(mm->context.ldt_pages[i]); + mm->context.size = 0; } static int read_ldt(void __user * ptr, unsigned long bytecount) { - int err; + int err, i; unsigned long size; struct mm_struct * mm = current->mm; @@ -138,8 +135,25 @@ size = bytecount; err = 0; - if (copy_to_user(ptr, mm->context.ldt, size)) - err = -EFAULT; + /* + * This is necessary just in case we got here straight from a + * context-switch where the ptes were set but no tlb flush + * was done yet. We rather avoid doing a TLB flush in the + * context-switch path and do it here instead. + */ + __flush_tlb_global(); + + for (i = 0; i < size; i += PAGE_SIZE) { + int nr = i / PAGE_SIZE, bytes; + char *kaddr = kmap(mm->context.ldt_pages[nr]); + + bytes = size - i; + if (bytes > PAGE_SIZE) + bytes = PAGE_SIZE; + if (copy_to_user(ptr + i, kaddr, size - i)) + err = -EFAULT; + kunmap(mm->context.ldt_pages[nr]); + } up(&mm->context.sem); if (err < 0) return err; @@ -158,7 +172,7 @@ err = 0; address = &default_ldt[0]; - size = 5*sizeof(struct desc_struct); + size = 5*LDT_ENTRY_SIZE; if (size > bytecount) size = bytecount; @@ -200,7 +214,15 @@ goto out_unlock; } - lp = (__u32 *) ((ldt_info.entry_number << 3) + (char *) mm->context.ldt); + /* + * No rescheduling allowed from this point to the install. + * + * We do a TLB flush for the same reason as in the read_ldt() path. + */ + preempt_disable(); + __flush_tlb_global(); + lp = (__u32 *) ((ldt_info.entry_number << 3) + + (char *) __kmap_atomic_vaddr(KM_LDT_PAGE0)); /* Allow LDTs to be cleared by the user. */ if (ldt_info.base_addr == 0 && ldt_info.limit == 0) { @@ -221,6 +243,7 @@ *lp = entry_1; *(lp+1) = entry_2; error = 0; + preempt_enable(); out_unlock: up(&mm->context.sem); @@ -248,3 +271,26 @@ } return ret; } + +/* + * load one particular LDT into the current CPU + */ +void load_LDT_nolock(mm_context_t *pc, int cpu) +{ + struct page **pages = pc->ldt_pages; + int count = pc->size; + int nr_pages, i; + + if (likely(!count)) { + pages = &default_ldt_page; + count = 5; + } + nr_pages = (count*LDT_ENTRY_SIZE + PAGE_SIZE-1) / PAGE_SIZE; + + for (i = 0; i < nr_pages; i++) { + __kunmap_atomic_type(KM_LDT_PAGE0 - i); + __kmap_atomic(pages[i], KM_LDT_PAGE0 - i); + } + set_ldt_desc(cpu, (void *)__kmap_atomic_vaddr(KM_LDT_PAGE0), count); + load_LDT_desc(); +} --- diff/arch/i386/kernel/mpparse.c 2003-11-25 15:24:57.000000000 +0000 +++ source/arch/i386/kernel/mpparse.c 2003-11-26 10:09:04.000000000 +0000 @@ -668,7 +668,7 @@ * Read the physical hardware table. Anything here will * override the defaults. */ - if (!smp_read_mpc((void *)mpf->mpf_physptr)) { + if (!smp_read_mpc((void *)phys_to_virt(mpf->mpf_physptr))) { smp_found_config = 0; printk(KERN_ERR "BIOS bug, MP table errors detected!...\n"); printk(KERN_ERR "... disabling SMP support. (tell your hw vendor)\n"); @@ -1129,8 +1129,11 @@ continue; ioapic_pin = irq - mp_ioapic_routing[ioapic].irq_start; - if (!ioapic && (irq < 16)) - irq += 16; + if (es7000_plat) { + if (!ioapic && (irq < 16)) + irq += 16; + } + /* * Avoid pin reprogramming. PRTs typically include entries * with redundant pin->irq mappings (but unique PCI devices); @@ -1147,21 +1150,29 @@ if ((1<irq = irq; + if (use_pci_vector() && !platform_legacy_irq(irq)) + irq = IO_APIC_VECTOR(irq); + entry->irq = irq; continue; } mp_ioapic_routing[ioapic].pin_programmed[idx] |= (1<irq = irq; - + if (!io_apic_set_pci_routing(ioapic, ioapic_pin, irq, edge_level, active_high_low)) { + if (use_pci_vector() && !platform_legacy_irq(irq)) + irq = IO_APIC_VECTOR(irq); + entry->irq = irq; + } printk(KERN_DEBUG "%02x:%02x:%02x[%c] -> %d-%d -> IRQ %d\n", entry->id.segment, entry->id.bus, entry->id.device, ('A' + entry->pin), mp_ioapic_routing[ioapic].apic_id, ioapic_pin, entry->irq); } + + print_IO_APIC(); + + return; } #endif /*CONFIG_ACPI_PCI*/ --- diff/arch/i386/kernel/nmi.c 2003-10-09 09:47:16.000000000 +0100 +++ source/arch/i386/kernel/nmi.c 2003-11-26 10:09:04.000000000 +0000 @@ -31,7 +31,16 @@ #include #include +#ifdef CONFIG_KGDB +#include +#ifdef CONFIG_SMP +unsigned int nmi_watchdog = NMI_IO_APIC; +#else +unsigned int nmi_watchdog = NMI_LOCAL_APIC; +#endif +#else unsigned int nmi_watchdog = NMI_NONE; +#endif static unsigned int nmi_hz = HZ; unsigned int nmi_perfctr_msr; /* the MSR to reset in NMI handler */ extern void show_registers(struct pt_regs *regs); @@ -408,6 +417,9 @@ for (i = 0; i < NR_CPUS; i++) alert_counter[i] = 0; } +#ifdef CONFIG_KGDB +int tune_watchdog = 5*HZ; +#endif void nmi_watchdog_tick (struct pt_regs * regs) { @@ -421,12 +433,24 @@ sum = irq_stat[cpu].apic_timer_irqs; +#ifdef CONFIG_KGDB + if (! in_kgdb(regs) && last_irq_sums[cpu] == sum ) { + +#else if (last_irq_sums[cpu] == sum) { +#endif /* * Ayiee, looks like this CPU is stuck ... * wait a few IRQs (5 seconds) before doing the oops ... */ alert_counter[cpu]++; +#ifdef CONFIG_KGDB + if (alert_counter[cpu] == tune_watchdog) { + kgdb_handle_exception(2, SIGPWR, 0, regs); + last_irq_sums[cpu] = sum; + alert_counter[cpu] = 0; + } +#endif if (alert_counter[cpu] == 5*nmi_hz) { spin_lock(&nmi_print_lock); /* --- diff/arch/i386/kernel/process.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/kernel/process.c 2003-11-26 10:09:04.000000000 +0000 @@ -47,6 +47,7 @@ #include #include #include +#include #ifdef CONFIG_MATH_EMULATION #include #endif @@ -302,6 +303,9 @@ struct task_struct *tsk = current; memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8); +#ifdef CONFIG_X86_HIGH_ENTRY + clear_thread_flag(TIF_DB7); +#endif memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); /* * Forget coprocessor state.. @@ -315,9 +319,8 @@ if (dead_task->mm) { // temporary debugging check if (dead_task->mm->context.size) { - printk("WARNING: dead process %8s still has LDT? <%p/%d>\n", + printk("WARNING: dead process %8s still has LDT? <%d>\n", dead_task->comm, - dead_task->mm->context.ldt, dead_task->mm->context.size); BUG(); } @@ -352,7 +355,17 @@ p->thread.esp = (unsigned long) childregs; p->thread.esp0 = (unsigned long) (childregs+1); + /* + * get the two stack pages, for the virtual stack. + * + * IMPORTANT: this code relies on the fact that the task + * structure is an 8K aligned piece of physical memory. + */ + p->thread.stack_page0 = virt_to_page((unsigned long)p->thread_info); + p->thread.stack_page1 = virt_to_page((unsigned long)p->thread_info + PAGE_SIZE); + p->thread.eip = (unsigned long) ret_from_fork; + p->thread_info->real_stack = p->thread_info; savesegment(fs,p->thread.fs); savesegment(gs,p->thread.gs); @@ -504,10 +517,41 @@ __unlazy_fpu(prev_p); +#ifdef CONFIG_X86_HIGH_ENTRY + /* + * Set the ptes of the virtual stack. (NOTE: a one-page TLB flush is + * needed because otherwise NMIs could interrupt the + * user-return code with a virtual stack and stale TLBs.) + */ + __kunmap_atomic_type(KM_VSTACK0); + __kunmap_atomic_type(KM_VSTACK1); + __kmap_atomic(next->stack_page0, KM_VSTACK0); + __kmap_atomic(next->stack_page1, KM_VSTACK1); + + /* + * NOTE: here we rely on the task being the stack as well + */ + next_p->thread_info->virtual_stack = + (void *)__kmap_atomic_vaddr(KM_VSTACK0); + +#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP) + /* + * If next was preempted on entry from userspace to kernel, + * and now it's on a different cpu, we need to adjust %esp. + * This assumes that entry.S does not copy %esp while on the + * virtual stack (with interrupts enabled): which is so, + * except within __SWITCH_KERNELSPACE itself. + */ + if (unlikely(next->esp >= TASK_SIZE)) { + next->esp &= THREAD_SIZE - 1; + next->esp |= (unsigned long) next_p->thread_info->virtual_stack; + } +#endif +#endif /* * Reload esp0, LDT and the page table pointer: */ - load_esp0(tss, next->esp0); + load_virtual_esp0(tss, next_p); /* * Load the per-thread Thread-Local Storage descriptor. --- diff/arch/i386/kernel/reboot.c 2003-10-09 09:47:33.000000000 +0100 +++ source/arch/i386/kernel/reboot.c 2003-11-26 10:09:04.000000000 +0000 @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include "mach_reboot.h" @@ -154,12 +155,11 @@ CMOS_WRITE(0x00, 0x8f); spin_unlock_irqrestore(&rtc_lock, flags); - /* Remap the kernel at virtual address zero, as well as offset zero - from the kernel segment. This assumes the kernel segment starts at - virtual address PAGE_OFFSET. */ - - memcpy (swapper_pg_dir, swapper_pg_dir + USER_PGD_PTRS, - sizeof (swapper_pg_dir [0]) * KERNEL_PGD_PTRS); + /* + * Remap the first 16 MB of RAM (which includes the kernel image) + * at virtual address zero: + */ + setup_identity_mappings(swapper_pg_dir, 0, 16*1024*1024); /* * Use `swapper_pg_dir' as our page directory. @@ -263,7 +263,12 @@ disable_IO_APIC(); #endif - if(!reboot_thru_bios) { + if (!reboot_thru_bios) { + if (efi_enabled) { + efi.reset_system(EFI_RESET_COLD, EFI_SUCCESS, 0, 0); + __asm__ __volatile__("lidt %0": :"m" (no_idt)); + __asm__ __volatile__("int3"); + } /* rebooting needs to touch the page at absolute addr 0 */ *((unsigned short *)__va(0x472)) = reboot_mode; for (;;) { @@ -273,6 +278,8 @@ __asm__ __volatile__("int3"); } } + if (efi_enabled) + efi.reset_system(EFI_RESET_WARM, EFI_SUCCESS, 0, 0); machine_real_restart(jump_to_bios, sizeof(jump_to_bios)); } @@ -287,6 +294,8 @@ void machine_power_off(void) { + if (efi_enabled) + efi.reset_system(EFI_RESET_SHUTDOWN, EFI_SUCCESS, 0, 0); if (pm_power_off) pm_power_off(); } --- diff/arch/i386/kernel/setup.c 2003-10-27 09:20:43.000000000 +0000 +++ source/arch/i386/kernel/setup.c 2003-11-26 10:09:04.000000000 +0000 @@ -36,6 +36,8 @@ #include #include #include +#include +#include #include