• 0511月16,17,18日

    2006-01-04

    Tag:

    版权声明:转载时请以超链接形式标明文章原始出处和作者信息及本声明
    http://mmmmn.blogbus.com/logs/1785239.html

    1、Ultra2 shoots itself in the foot and sticks tounge out


    Ulf Tropp wrote:
    > I have acquired (2nd hand) a 2-processor Ultra2. After running for a
    > while, it shuts its
    > own power off. Cycling (off-on) the power button or pulling/putting back
    > the power cord brings
    > it up again. 'A while' currently means a couple of days, but for a while
    > ment just
    > to the 'Initializing memory' message. The better time came after running
    > with diag-switch?=true.
    > This has developed with time (I didn' run it for several months), though
    > maybe it was there
    > when I got the machine (I never ran it for more than a day).
    > Since I gather the machine will remember power on/off status through a
    > power outage there
    > must be some non-volatile storage somewhere. Could that be faulty?
    > Also, it has developed a habit of not keeping the CD tray in, with or
    > without CD.


    You could hold [STOP]-[D] keys down when you power up the machine, this can
    indicate some problems with hardware depending on what LEDs may be light, a
    complete list should be in the U2 manual (see docs.sun.com if you don't have
    one), but here it is:


    Num Lock = Memory failed
    Scroll Lock = Memory failed
    Num Lock + Scroll Lock = Memory failed
    Compose = Memory failed
    Compose + Num Lock = Memory failed
    Compose + Scroll Lock = Memory failed
    Compose + Num Lock +Scroll Lock = Memory failed
    Caps Lock = Memory failed
    Caps Lock + Scroll Lock = No memory found
    (It's the different memories that are tested)


    Caps Lock + Num Lock = Main board failed
    Caps Lock + Compose + Scroll Lock = CPU 1 failed
    Caps Lock + Compose + Scroll Lock + Num Lock = CPU 2 failed


    The Caps Lock LED it blinks on and off to indicate that the POSTs are

    running.


    I hope that can give you some indication of troubles.

    ---


    2、syslog: /dev/console doesn't work???
    I am using SB1500 and Solaris 10.
    The problem is that I am getting nothing on my Console.
    I checked /etc/syslog.conf and compared with other servers.
    Other servers works fine.


    I tested with a "echo" from another dt terminal.
    # echo HELLO >> /dev/console
    I can see "HELLO" from all other servers and only fails on one server.


    What seems to be the problem?
    Thanks in advance...


    ---
    Thanks Colin


    That's it.
    I checked /dev/console and it's liked file's permission and ownership.
    and succeeded after I change the owner of /devices/pseudo/cn@0:console
    file to the proper owner.
    I made a script in /etc/rc3.d cause the owner automatically changed to
    root after reboot.


    [ftp:/ 33%]ls -l /dev/console
    lrwxrwxrwx 1 root other 30 6鞗? 9鞚? 17:20
    /dev/console -> ../devices/pseudo/cn@0:console
    [ftp:/ 34%]ls -l /devices/pseudo/cn@0:console
    crw--w---- 1 root tty 0, 0 11鞗?28鞚? 13:40
    /devices/pseudo/cn@0:console
    [ftp:/ 35%]


    3、> I've got some requirements to set up a new data center (mixed 440's,
    > 2900's, 25K's) using IPMP to provide IP connectivity redundancy.
    > Anybody got any warstories/advice/landmines to avoid?
    >
    > TIA,
    >
    > BV


    Make sure that you are autonegotiating your interfaces to your switch,
    whenever possible (to avoid spurious errors from IPMP when it tries to
    act upon the interfaces that have not been set/forced yet).


    Make sure that you are using unique mac addresses on your interfaces.


    The advice from the OP about using static hosts is good especially if
    your defaultrouter tends to drop or degrade ICMP messages under load or
    is configured to drop ICMP messages. (IPMP probes that test the link are
    ICMP messages)


    Go here: http://www.sun.com/blueprints/browsesubject.html


    and take a look at this docs (as well as the product documentation and
    Sun infodoc 70062):


    Internet Protocol Network Multipathing (Update) (November 2002)
    -by Mark Garner
    This article looks at the features of Internet Protocol network
    multipathing and the steps required to configure it for network adapter
    resilience. <snipped>


    (there a couple of others on that page as well, search for "multipath"


    HTH!

    ---
    > I've got some requirements to set up a new data center (mixed 440's,
    > 2900's, 25K's) using IPMP to provide IP connectivity redundancy.
    > Anybody got any warstories/advice/landmines to avoid?


    We configured all our Suns with IPMP. Works like a charm. However, there
    was one issue with firewalls. Outgoing TCP connections on IPMP round-robin
    on the (data? test?) IPs. I.e. the first connection originates from one of
    the (data? test?) addresses, the next connection from the other (data?
    test?) address, etc. Your firewall has to be aware of that in order not to
    block every other outbound connection. Maybe this can be avoided by using
    just one data address, but anyway, this is something to be aware of.


    For some time we wondered whether Sun Network Trunking
    (http://www.sun.com/products/networking/ethernet/suntrunking/) was an
    alternative. It probably is. Even a combination of IPMP and trunking may be
    interesting. However, between two machines, trunking of four 1 Gb links
    does not get one 4 Gbps of bandwidth. Apparently, trunking hashes foreign
    MACs to local ports, so that between two machines one still only gets 1
    Gbps; between one trunking server and four or more other machines one would
    get 4 Gbps as four times 1 Gbps.

    ---


    4、ohaya wrote:
    >
    > ohaya wrote:
    > >
    > > ohaya wrote:
    > > > ohaya wrote:
    > > > >
    > > > > Hi,
    > > > >
    > > > > I just got a used Sunblade 100 and am starting to familiarize myself
    > > > > with it. I did a clean installation of Solaris 9 (12/03) on it, and I
    > > > > wanted to try to do an image backup to an external USB hard drive.
    > > > >
    > > > > I've been searching, and I understand that there are a number of
    > > > > possible ways to do this, including ufsdump and flar, but from what I
    > > > > understand, it seems like flar should work the way that I want.
    > > > >
    > > > > Has anyone tried something like this before, and if so, can you

    provide
    > > > > the steps I need?
    > > > >
    > > > > From what I can tell, I need to do something like the following:
    > > > >
    > > > > 1) Boot from CD to single user mode
    > > > >
    > > > > 2) Mount the external drive:
    > > > >
    > > > > mount -F pcfs /dev/dsk/cXt0d0p0:c /mnt
    > > > >
    > > > > 3) Run flar?
    > > > >
    > > > > Does the above look right? Also, a couple of questions:
    > > > >
    > > > > a) How do I determine the device for the USB hard drive (i.e., the
    > > > > "cXt0d0p0"?
    > > > >
    > > > > b) Can anyone provide the exact flar command line that I'd need?
    > > > >
    > > > > Thanks!
    > > > >
    > > > > Jim
    > > >
    > > >
    > > > Hi,
    > > >
    > > > I guess that I was getting ahead of myself :(...
    > > >
    > > > I'm having trouble mounting the USB hard drive. Can anyone tell me how
    > > > to do this?
    > > >
    > > > "cfgadm -n" seems to show "usb-storage".
    > > >
    > > > Jim
    > >
    > > Hi,
    > >
    > > Here's some of the info from the drive:
    > >
    > > bash-2.05# cfgadm -nv
    > > Ap_Id Receptacle Occupant Condition
    > > Information
    > > When Type Busy Phys_Id
    > > c0 connected configured unknown
    > > unavailable scsi-bus n /devices/pci@1f,0/ide@d:scsi
    > > usb0/1 empty unconfigured ok
    > > unavailable unknown n /devices/pci@1f,0/usb@c,3:1
    > > usb0/2 connected configured ok
    > > Mfg: <undef> Product: <undef> NConfigs: 1 Config: 0 <no cfg str
    > > descr>
    > > unavailable usb-mouse n /devices/pci@1f,0/usb@c,3:2
    > > usb0/3 connected configured ok
    > > Mfg: Cypress Semiconductor Product: USB2.0 Storage Device NConfigs: 1
    > > Config: 0 <no cfg str descr>
    > > unavailable usb-storage n /devices/pci@1f,0/usb@c,3:3
    > > usb0/4 connected configured ok
    > > Mfg: <undef> Product: <undef> NConfigs: 1 Config: 0 <no cfg str
    > > descr>
    > > unavailable usb-kbd n /devices/pci@1f,0/usb@c,3:4
    > >
    > > >From "prtconf -D":
    > >
    > > usb, instance #0 (driver name: ohci)
    > > mouse, instance #0 (driver name: hid)
    > > keyboard, instance #1 (driver name: hid)
    > > storage, instance #0 (driver name: scsa2usb)
    > > disk, instance #1 (driver name: sd)
    > >
    > > bash-2.05# ls -l|grep usb
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s0 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:a
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s1 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:b
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s2 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:c
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s3 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:d
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s4 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:e
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s5 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:f
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s6 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:g
    > > lrwxrwxrwx 1 root root 51 Nov 17 20:44 c1t0d0s7 ->
    > > ../../devices/pci@1f,0/usb@c,3/storage@3/disk@0,0:h
    > >
    > > The drive is a Seagate 60GB drive (ST360021), and under Windows, I had
    > > 2 partitions. The 1st partition is an NTFS partition, and the 2nd is a
    > > FAT32 partition. Both about 30GB.
    > >
    > > Is there no way for me to mount the drive and be able to access the 2nd
    > > FAT32 partition from within Solaris??
    > >
    > > Thanks,
    > > Jim
    >
    > Hi,
    >
    > Does anyone have any ideas on this?
    >
    > Right now, I'm re-installing Solaris 9, using the latest (9/05) version,
    > as from some stuff that I found, there were some changes in the USB
    > support after the 12/03 version. Beyond that, and if that doesn't work,
    > I'm kind of stuck, so any suggestions would be appreciated.
    >
    > The thing that seems strange to me is that Solaris *seems* to see the
    > USB hard drive all right, but for whatever reason, I just can't seem to
    > mount the drive.
    >
    > I've tried re-creating the partition under Windows again (it's a primary
    > partition), but still can't mount it.
    >
    > Jim


    Hi,


    Just for the record, and in case anyone ever encounters this problem in
    the future:


    I just finished the Solaris installation, to the 9/05 version, and I can
    now mount the partitions on the USB hard drive:


    mount -F pcfs /dev/dsk/c1t0d0s2:c /mnt for the 1st partition and
    mount -F pcfs /dev/dsk/c1t0d0s2:d /mnt-d for the 2nd partition.


    So, it appears that the 12/03 version just didn't work for some reason,
    but 9/05 version is good.


    Jim

    5、> For instance I am running a program TEST
    > how can I know how much memory test occupies?


    > I use ps -ef|grep TEST to get the PID of TEST
    > then I use top -p PID
    > but it seems not working right.


    Why do you think it's not working? Can you show the figures and why you
    think they're incorrect?


    'ps' may be a good choice. The output tokens vsz, osz, and rss each
    report different aspsects of memory use.


    'pmap -x <PID>' can give a very detailed report.


    6、tonij67@hotmail.com wrote:
    > 9665 oracle 882M 247M sleep 53 2 0:00:07 0.1% oracle/12
    > 9675 oracle 880M 247M sleep 53 2 0:00:00 0.0% oracle/1
    > 9673 oracle 880M 247M sleep 53 2 0:00:00 0.0% oracle/1
    > 9663 oracle 880M 247M sleep 53 2 0:00:00 0.0% oracle/22
    > ...(there is a lot more but I dont want to clutter this post)


    > I took the SIZE and RSS columns and added up the totals. SIZE amounts
    > to 19gb and RSS amounts to 8GB. This doesnt make sense to me because
    > there is only 2gb of ram and 4 GB of swap space. Speaking of swap
    > space, I looked at swap -s right after I ran this prstat command and it
    > told me only 1.5 GB of swap is in use.


    > What am I missing here? How can the memory usage stats. that prstat is
    > showing me exceed my total RAM+swap?


    prstat is doing the same thing that 'ps' does when calculating the size
    of a process. It looks at all pages in use by a process and sums them
    up. In this case, many of those pages are shared among all orcle
    processes. So for each process, it's summing the same pages again.


    When you take those sums and add them together, you're adding up
    multiple copies of the same pages. Each figure for a process is correct
    in some ways, but not in a way that lets you add them together.


    You may want to look at 'pmap -x <PID>' output. It tries to do more of
    a breakdown of pages that are shared and private.


    Basically, there's no tool out there that I know of that can say "Out of
    the 1000 pages in this system, oracle processes are using 462." It
    would be handy if there is one.


    --

    8、> Adding to this... does anyone use luupgrade with SDS? I keep reading
    > conflicting statements about if this is possible or not?


    That's because there's conflicting truths about it.


    In Solaris 8, and early Solaris 9 it wasn't possible - you could LU
    from an SDS volume, but not to one.
    In recent Solaris 9 and later you can LU to/from SDS/SVM, and even from
    (but not to) VxVM.


    And as this is based on the version of Solaris you're going to, unless
    you're upgrading to Solaris 8 the general answer is that you can use
    SDS/SVM volumes.


      Scott

    9、> We are considering Sunray for our company and I would like to hear some
    > horror stories if anyone cares to share.


    No horror story here - just a few things that showed up after moving
    a few users from workstations to SunRays:


    When moving from a Blade 1500 to a SunRay 1g with a V240 as the
    SunRay server, I have seen no real speed penalty in everyday work.
    People which moved from Blade 100s to the same SunRay server
    were impressed by the enhanced speed of application startup.


    The USB ports on the SunRay clients are USB 1.1 only, and even
    for that they seem to be pretty slow. While it's easy to use
    USB memory sticks on the SunRay, I've had multiple complaints
    that it is painfully slooow.


    In a SunRay-only office, there's no way to access data CDs. I've
    kept one workstation in my office just for that. It would be nice
    if there was a fan-less NAS device which could share CDs/DVDs
    via Ethernet, or if there was a USB 2.0 SunRay supporting local
    access to CD/DVD drives.


    NSCM (non-smartcard mobility) is great - when users have a problem
    on their desktop, they can pull their sessions into my office to
    help them.


    Setting up SunRays behind a firewall (e.g. at home) is quite
    complicated (or impossible without a VPN) as it is now.


    With workstations, it was easy to tell one user to log out over
    night so I could re-install (or install patches on) one machine.
    With SunRays, I have to ask all SunRays users to log out instead.


    An all-in-one SunRay client with a larger screen (>= 19") would
    be nice. As of now there's only the SunRay 170 (17").


    hth, mp.

    --
    >> The USB ports on the SunRay clients are USB 1.1 only, and even
    >> for that they seem to be pretty slow. While it's easy to use
    >> USB memory sticks on the SunRay, I've had multiple complaints
    >> that it is painfully slooow.
    >
    > Yes they are slow, but if only used for memorysticks no big deal of that.


    I've made some quick tests now:


      % cd /tmp/SUNWut/mnt/martin/unnamed/


      % ptime mkfile 10m tt
      real 1:36.684


      % ptime cp tt /tmp
      real 54.633


      % ptime cp /tmp/tt .
      real 48.032


    This shows about 100-200 kB/sec, which is far from the 12 MBit/s that
    USB 1.1 should provide. It's so slow that copying just a few photos
    etc. is a real PITA.


    Is it faster for other Sun Ray users ? Maybe I just have a problem
    in my local setup ? It's SRSS 3.0 on a Sun V240 with Solaris 9 4/04
    with all R/S patches installed.


    mp

    --
    Martin Paul wrote:
    > Michael Laajanen <michael_laajanen@yahoo.com> wrote:
    >
    >>Martin Paul wrote:
    >>
    >>>The USB ports on the SunRay clients are USB 1.1 only, and even
    >>>for that they seem to be pretty slow. While it's easy to use
    >>>USB memory sticks on the SunRay, I've had multiple complaints
    >>>that it is painfully slooow.
    >>
    >>Yes they are slow, but if only used for memorysticks no big deal of that.
    >
    >
    > I've made some quick tests now:
    >
    > % cd /tmp/SUNWut/mnt/martin/unnamed/
    >
    > % ptime mkfile 10m tt
    > real 1:36.684


    $ ptime mkfile 10m tt


    real 2:44.056
    user 0.004
    sys 0.120

    So my install(3.1) is even slower, but I still not that uphappy about it
    but that ofcourse depends if you do this often, I don't.


    More annoying is that the keyboard and mouse losses data, so if I write
    abcdefg during a copy it wil be abcefg jusas Iwrite ow:)


    That is bad!


    But most people I meet like the Sunrays, silence I think is the best and
    to be able to move from a office to the other then to the lab.


    Only know I Sweden a WS is better since they when it's getting cold you
    can have the feets on the WS under the table :)

    /michael

    ---

    9、> This is not a VxFS issue, but a VxVM (Volume Manager).


    > Veritas by default creates a separate log plex for RAID 5. You have to
    > create a RAID 5 volume without a log plex:


    Right.


    > vxassist -g <DG> maxsize layout=raid5,nolog ncol=4
    > vxassist -g <DG> make <volume> <size - see above> layout=raid5,nolog ncol=4


    > Beware that in the case of a system crash RAID 5 recovery takes much longer
    > without a dedicated log plex.


    No. Recovery of raid 5 by parity playback will take the same amount of
    time either way. (The log is too small to affect that).


    However there are failure modes without the log in place that will mean
    that recovery cannot be guaranteed. While writing a particular stripe,
    some columns may be updated before other columns, so that for an
    instant, parity is not correct. If a disk is lost and the system
    crashes at that moment, then parity replay will not be able to
    reconstruct the data.


    The log protects the in-use stripe data during the time until all the
    writes have been completed.


    Without the log, you can force the reconstruction, but the data may not
    be correct.


    Since this is a general issue with software raid 5, I have no idea if
    ODS/SVM addresses this issue. I am guessing that it does not and would
    simply assume playback is correct.

    --
    >> Beware that in the case of a system crash RAID 5 recovery takes much

    longer
    >> without a dedicated log plex.
    >
    > No. Recovery of raid 5 by parity playback will take the same amount of
    > time either way. (The log is too small to affect that).


    I think you misunderstood me. With crash I don't mean disk failure but
    system crash (kernel panic, power outage, etc.). With a dedicated log
    plex in place, data just has to be replayed from the log.


    With no log parity has to be recalculated for each block.

    > However there are failure modes without the log in place that will mean
    > that recovery cannot be guaranteed. While writing a particular stripe,
    > some columns may be updated before other columns, so that for an
    > instant, parity is not correct. If a disk is lost and the system
    > crashes at that moment, then parity replay will not be able to
    > reconstruct the data.


    Even worse: The recovered data might be completely garbage. With mirror
    recovery the recovered data on disk might be old or new data. But with
    RAID 5 you might end with corrupt data. If the dirty block is filled with
    filesytem metadata you might corrupt your whole filesystem.

    > The log protects the in-use stripe data during the time until all the
    > writes have been completed.


    So with a log during replay after a crash you don't have to recalculate
    parity of the whole raid, just rewrite the blocks in the log - so
    recovery should be much faster.

    > Since this is a general issue with software raid 5, I have no idea if
    > ODS/SVM addresses this issue. I am guessing that it does not and would
    > simply assume playback is correct.


    SDS/SVM allocates a pre write area at the start of each device of the
    RAID 5 (usually enough for at least 10 full columns). Therefor RAID 5
    writes are incredible slow with SDS because of the massive amount of
    disk seeks required.


    ---
    > Darren Dunham <ddunham@redwood.taos.com> wrote:
    >>> Beware that in the case of a system crash RAID 5 recovery takes much

    longer
    >>> without a dedicated log plex.
    >>
    >> No. Recovery of raid 5 by parity playback will take the same amount of
    >> time either way. (The log is too small to affect that).


    > I think you misunderstood me. With crash I don't mean disk failure but
    > system crash (kernel panic, power outage, etc.). With a dedicated log
    > plex in place, data just has to be replayed from the log.


    Ah, yes. Good point.


    > Even worse: The recovered data might be completely garbage. With mirror
    > recovery the recovered data on disk might be old or new data. But with
    > RAID 5 you might end with corrupt data. If the dirty block is filled with
    > filesytem metadata you might corrupt your whole filesystem.


    Mirroring is also no guarantee against corruption. Because the system
    has no way of telling which mirror was most recently updated (a DRL is
    not an intent log), it simply picks one side. The combination of two
    blocks, each correct at one point in time, can lead to global
    inconsistencies. It may not scramble a block the way a bad raid
    reconstruct can, but it can be just as annoying.


    Hopefully any such blocks will also be in the filesystem log as well, so
    that they can be re-written consistently.


    > So with a log during replay after a crash you don't have to recalculate
    > parity of the whole raid, just rewrite the blocks in the log - so
    > recovery should be much faster.


    Yes.


    >> Since this is a general issue with software raid 5, I have no idea if
    >> ODS/SVM addresses this issue. I am guessing that it does not and would
    >> simply assume playback is correct.


    > SDS/SVM allocates a pre write area at the start of each device of the
    > RAID 5 (usually enough for at least 10 full columns). Therefor RAID 5
    > writes are incredible slow with SDS because of the massive amount of
    > disk seeks required.


    Very interesting. I had no idea. So this would be similar to creating
    a VM volume with several R5 log plexes on the same devices as the data
    subdisks?

    --


    > Mirroring is also no guarantee against corruption. Because the system
    > has no way of telling which mirror was most recently updated (a DRL is
    > not an intent log), it simply picks one side. The combination of two
    > blocks, each correct at one point in time, can lead to global
    > inconsistencies. It may not scramble a block the way a bad raid
    > reconstruct can, but it can be just as annoying.


    After further thinking it should make no difference, regardless if your
    data is mirrored, RAID 5 protected or not protected at all.


    - Data at the time of crash could be in the buffer cache in the kernel
    - Synchronous I/O should only be committed back to the caller only after it
      has been written to all disks
    - If the system crashes in the middle of a transaction, the application has
      no guarantee if data has been written or not (or how much data has been
      written).


    Therefor:
    - In a mirror setup it doesn't matter which mirror side is synchronized from
      the other. But it should be synced so that two reads of the same block
      always return the same data
    - For RAID-5: After a crash parity has to be recalculated from the data
      blocks - not a single data block from a parity block


    Esp. the last one is crucial. Suppose the following happens:


    - RAID 5 slice with blocks D1 D2 D3 P should be written
    - D1, D2 and P are written to disk, but not D3
    - The parity has to be recalculated from D1^D2^D3
    - Suppose what's happening if D1 is recalculated with D2^D3^P
      Since D3 is stale D1 will only contain garbage


    This could happen in one scenario: Power outage. After poweron one disks

    fails
    (not uncommon - most disks fail during a power cycle). If this disk contains
    D1 or D2 (see above) you might end up with corrupt data: neither the old one
    nor the new one, but some random garbage.


    So a log is always a good idea. On VxVM this log plex has to be on a separate
    disk from the data volume (otherwise it wouldn't make any sense)

    >> SDS/SVM allocates a pre write area at the start of each device of the
    >> RAID 5 (usually enough for at least 10 full columns). Therefor RAID 5
    >> writes are incredible slow with SDS because of the massive amount of
    >> disk seeks required.
    >
    > Very interesting. I had no idea. So this would be similar to creating
    > a VM volume with several R5 log plexes on the same devices as the data
    > subdisks?


    I don't think so. Otherwise there would still be a SPOF. Logically it
    should be more like a mirrored log plex with the mirror spread over the
    same disks as the data subdisks.


    For maximizing performance the VxVM approach is much better. But in contrast
    to SVM you'll need space in an additional disk. If you already have a
    mirrored root disk you could use some free space here - the log is really
    small (just a few MBytes).


    Otherwise I would strongly recommend not using software RAID-5.
    I just started some tests with RAID-Z on ZFS. First impressions are
    interesting at least.


    --


    Daniel Rock <abuse@deadcafe.de> wrote:
    > - Data at the time of crash could be in the buffer cache in the kernel
    > - Synchronous I/O should only be committed back to the caller only after it
    > has been written to all disks
    > - If the system crashes in the middle of a transaction, the application has
    > no guarantee if data has been written or not (or how much data has been
    > written).


    > Therefor:
    > - In a mirror setup it doesn't matter which mirror side is synchronized

    from
    > the other. But it should be synced so that two reads of the same block
    > always return the same data


    Yes, I mentioned that if a filesystem log replay is present that will
    re-create the consistency. But that's a layer on top of the mirror.
    The mirror itself isn't providing it.


    > So a log is always a good idea. On VxVM this log plex has to be on a

    separate
    > disk from the data volume (otherwise it wouldn't make any sense)


    Well, you can have mulitple (mirrored) plexes on the volume. So putting
    a separate copy on multiple disks of the volume would give you access
    even in case of a disk failure at the cost of performance for seeking to
    it all the time on writes.


    >>> SDS/SVM allocates a pre write area at the start of each device of the
    >>> RAID 5 (usually enough for at least 10 full columns). Therefor RAID 5
    >>> writes are incredible slow with SDS because of the massive amount of
    >>> disk seeks required.
    >>
    >> Very interesting. I had no idea. So this would be similar to creating
    >> a VM volume with several R5 log plexes on the same devices as the data
    >> subdisks?


    > I don't think so. Otherwise there would still be a SPOF. Logically it
    > should be more like a mirrored log plex with the mirror spread over the
    > same disks as the data subdisks.


    Right. If you have mulitple log plexes for a volume, they're all
    mirrors of each other so no SPOF.



    历史上的今天:


    收藏到:Del.icio.us