EMC and Microsoft at the SharePoint conference!

October 4, 2011

Time:  Yesterday, 8.30 Pacific
Where:  SharePoint Conference 2011 in Anaheim, California
Venue: Conference kick-off keynote

On stage: A DJ and a high performance SharePoint farm in two racks.

Backbone: EMC VNX 5700 Unified Storage Array with NEC high density Servers.

So what?: how about a 14TB Content Database with FAST search and 100 millions documents, thats what!

Delighted to say that EMC was center stage at the SharePoint conference and a main focus of the SharePoint main demo.
The demo which lasted 10 minutes showed how a large-scale sharepoint farm with a extremely large content database of 14TB, with FAST Search Server, running at full load suffering a SQL Server node outage.
The environment, using SQL Server Denali CTP3 and SQL’s AlwaysOn technology was key in failing over the SQL Content databases in seconds.

EMC was key in maintaining performance around the clock with this solution, and the VNX Array was not stressed at all even though very large IOPS/second were generated.

SharePoint 2010 Sp1 bring much larger content database supported sizes, up to 4TB for general use
and UNLIMITED for Documents and Records Center in specific circumstances (- 5% content access or 1% content modified per month/avg)

More information
================
www.emc.com/sharepoint

whitepaper – Managing Multi-Terabyte Environments – http://go.microsoft.com/fwlink/?LinkId=223599

SP1 Announcement
http://blogs.technet.com/b/office_sustained_engineering/archive/2011/05/11/announcing-service-pack-1-for-office-2010-and-sharepoint-2010.aspx
Joel’s quick blog entry for the new limits –  http://www.sharepointjoel.com/Lists/Posts/Post.aspx?ID=457
Technet article on the details –  http://technet.microsoft.com/en-us/library/cc262787.aspx


SharePoint Conference Season

September 12, 2011

Hi all,

While May, June and August are the era for the big platform events such as EMC World, TechED, and VMWorld…
October is the season for my two major application events.

I am happy to announce that EMC are proud Gold Sponsors at:-

  • SharePoint Conference USA                    Anaheim, California – Oct 3-6
  • SQL PASS Summit                                    Seattle, WA – Oct 11-14

EMC @ the SharePoint Conference

  • Large booth where key experts from the EMC Business Units will be able to describe to you how to make your life easier with SharePoint
  • Demonstrations, mini-lectures, and Q&As
  • Free give-aways.  Yes, again, like TechEd, we will have free t-shirts and on the final day many, many cash spot-prizes for wearing your EMC T-shirt

      Two Sessions

Speaker(s):  James Baldwin, Eyal Sharon  (James & Eyal show)
Level: 200
Understand technical best practices to design and deploy a virtualized SharePoint that leverages FAST Search. Understand how design a flexible and robust architecture that supports your advanced collaboration requirements. Understand how to architect a solution that addresses IT challenges for data growth, application availability and simplified management that also enables your users to find and leverage the right business information to make better decisions.

Speakers:-  Matt Roberts, Nate Treloar
Level 300
Demonstrate how to integrate external video metadata generation services with native SharePoint Search capabilities

Dont forget Europe!

The European SharePoint Conference is taking place in Berlin, Germany   – October 17-20.

I will be there presenting the following session:-

Optimize, Store and Protect SharePoint 2010 Server…Best Practices     Wednesday 15:00 – Session W21

Learn about the critical best practices and considerations for optimizing and growing SharePoint farms, storing user data efficiently and securely, while backing up TB’s a data in minutes. RBS (Remote Blob Store) and Virtualization, are just two of the many techniques discussed in this session. Realize the considerations for providing fast, automated disaster recovery for the entire SharePoint environment through SAN-based technology.

EMC @ SQL PASS Summit

We will have something kinda special at the SQL PASS Summit.  Can’t say more.

But what I can say…

  • Large booth area in the Pavillion, with SQL Experts from EMC including two heros from our team, Tony Wu and Bruce Ye, travelling all the way from Shanghai.
  • Demos, booths, best practices and most importantly application-led conversations around;
  • SQL Server scalability – Infrastructure
  • Optimized Data Protection
  • High availability to where? Same SAN? Same site? next door? next state? next country?  – All of the above <—
  • Something Flashy
  • Proven Solutions around high-speed SQL deployments, one of which is in build right now with Michael and David in our Cork labs.

Hope to see you there.

James.


EMC World 2010 – Boston

May 10, 2010

Hi all,

If you find yourself at EMC World, why not drop into the Solutions Pavillon where we are showcases SharePoint and SQL solutions.

I’m also presenting the following sessions

Tuesday 08:00         SharePoint Storage Best Practices

Wednesday 08:00   Birds of a Feather – Expert Panel – SharePoint, SQL, Oracle and SAP

Thursday 13:00   SharePoint Storage Best Practices (repeat)

I’ll drop the slide here into here once we are done.

Cheers!

J


The iSCSI performance issue…

April 9, 2010

Just an update to the iSCSI performance issue.

Microsoft has released the KB article we worked on last week

http://support.microsoft.com/kb/2020559

Due to the significant change required, the design change request has been submitted and will not be in Windows 7, but is currently being considered for Windows 8 platform.


Best Practice for Hyper-V with iSCSI

March 10, 2010

Hello folks.

In EMC Proven Solution testing for a given use case, I have come across a serious issue in relation to iSCSI responses, which inheritently causes slow storage response times, very slow cluster polling and enumeration.

Test config
Windows 2008 R2 with Hyper-V.   6-node Hyper-V cluster.  65 Disks.  2x iSCSI NIC per node.

What I saw was a slowness in the cluster in bringing a VM’s Virtual Machine Configuration cluster resource online which had a large amount of disks configured.  They would time out as they passed their default pending and deadlock time-outs.

Firstly, when you online, refresh (or failover) a VM configuration, Hyper-V performs a sanity check to ensure the underlying components of the VM (network, storage, etc) are available.  This means scanning all the disks.
So, say I had a VM with 25 disks (in my case), the VM config took over 10 minutes to online!

Why?  Well working with Microsoft OEM Support, they asked me to try to tighten the TCPAckFrequency to 1(millisecond).  I say, OK I’ll try it.
This brought the online time from 10 minutes to 19 seconds!  Result…or maybe…

I needed to fully understand the issue, so out came WireShark in order run some Ethernet traces…

Let me explain what the actual issue is…

The problem is basically that iSCSI is a victim of Nagle’s Algorithm.  Optimization of TCP networks in terms of minimizing congestion due to TCP overhead.  iSCSI is essentially stung by send colalescing.

Windows 2003 onwards, the default TCP acknowledge time is 200ms.
This means that if a TCP segment (1462 bytes) is not full, it may need to wait up to 200ms before the data actually sent from the Windows host.

This is a problem when using iSCSI for two reasons
1) You need the fastest response time possible from storage for your application
2) iSCSI payloads can be very small, esp SCSI control OpCode (CDBs).

Now, the iSCSI CDB (control/query) OpCode commands involved in enumerating the disks during the online action have a tiny payload (10 bytes).

From looking at the ethernet trace, the cluster disk driver is performing SCSI(10) read commands (LUN read inquiry, read capacity, etc).  It does this sequentially, at least twice for each disk involved in the virtual machine.

With the default TCPAck time of 200ms, for each SCSI Read command issues by the cluster node, the payload of the command is 10bytes (on wire is 66bytes).  The SCSI Read command does not fill a TCP segment and so while the TCP payload is sent to the storage controller and the controller responds, the node waits to send the ACK until a segment on that NIC is filled or hits the max ACK time of 200ms.

So, let’s say I am the cluster disk driver and want to read LUN metadata as part of onlining a cluster disk…
…I send the command via the iSCSI Initiator, Storage Controller responds…wait….wait….wait…TcpAckFrequency trigger fires after a max of 200ms….only then is my ACK sent to storage….and the TCP transmission completes and the data is returned to the Cluster disk driver process.

This means for each SCSI command we attempt to send to storage (target and target LUN), we will typically end up waiting for the 200ms timer!  This is regardless of how busy SCSI data I/O is because this algorithm has Winsocket-based granularity.

In a Windows iSCSI cluster, this issue really comes to light because the cluster operates (controls/validates) on resources in the cluster in a sequential manner.
So, for say 65 LUNs, each LUN takes a significantly longer time to validate/control because the duration elongation due to the 200ms ACK timer happens multiple times per LUN.

Storage response times should be in the sub-10ms range, not 200ms 🙂

This is also why I did not see this issue in Fibre Channel environments.

So, while setting the TCP ACK frequency to hardcode to 1ms per-NIC helps, it may have adverse performance implications in terms of wire and Storage Port congestion.

But…for iSCSI Networks, this should not really be of concern because by best practice, you should have isolated iSCSI networks with minimal hops between host and storage.

 

The real fix I believe is by using the TCP_NODELAY option in SockOPTNS for the iSCSI Initator.  This by-passes the Nagle Algorithm completely for that process, so you dont need to even wait for that 1ms (seems a short time, but it is still a trigger time) – the SCSI command will fire immediately.

I have a design change request in for Microsoft to consider this as the way forward. Probably wont see it until Windows 8, but hey!

How to set the TCPAckFrequency on your iSCSI NICs

Regedit – backup your registry – before proceeding 🙂

Subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<Interface GUID>
Entry: TcpAckFrequency
Value Type: REG_DWORD, number
Valid Range: 0-255
Set to 1.

Do this only for your iSCSI NICs, unless directed otherwise.

Hope this helps

James.


EMC Live! WebCast Series – SharePoint

March 4, 2010

Hi Folks

Back again – been a while, aye but have been working on some exciting stuff, which I will share with you in due course.

So today if anyone has an hour to kill, Eyal and I are presenting an EMC webcast (part 1 of a 4-part series) on SharePoint and touching on what’s coming in SharePoint 2010.

http://info.emc.com/mk/get/DBM6473-4099_raf_lp?reg_src=JamesB

 

Today will be

Thursday, March 4, 2010 – 12 pm PT / 3 pm ET
Learn how to design your SharePoint infrastructure to ensure optimal performance and scalability, as well as leverage the benefits of virtualization.

Part 2 is Dave from our SourceOne engineering wing

Thursday, March 11, 2010 – 12 pm PT / 3 pm ET
Find out how to mitigate risk, reduce costs, and improve SharePoint performance with EMC SourceOne.

Part 3 is Eyal and I again

Thursday, March 18, 2010 – 12 pm PT / 3 pm ET
Learn how to design, deploy, and manage your SharePoint infrastructure to ensure availability and rapid recovery, as well as understand what options are available—from native SQL Server functionality to array-based replication.

Part 4 is another Dave from another engineering wing, EMC Documentum
 
Enhancing SharePoint to Meet Your Information Management Needs

Thursday, March 25, 2010 – 12 pm PT / 3 pm ET
Discover how EMC Documentum integrates SharePoint into your broader information infrastructure, enabling you to cut operational costs and reign in server sprawl.

Hope to see ye there !

But I will share the content with ye later anyways 🙂

Thanks

James,


EMC Replication Manager for SharePoint – watch this!!!

October 16, 2009

As I mentioned before about a special project I was working, well this is it.  SharePoint DBAs SHOULD be excited.

Imagine
*configuring backup protection for the whole SharePoint farm in less than 5 minutes!

*full backup of an active (240,000 heavy users) SharePoint farm 
   – 3 hours 11 minutes. online, no disruption. 
   – 1.5TB of user content, 2.5TB of SharePoint files.

* incremental backup (with a daily change rate of 1%), 
   – 11 minutes

* restore a 100GB content database 
     – 7 minutes

* perform item-level recovery from a backup in minutes
   – without distruption, without a recovery farm
 

This blog is all about making life easier for the SharePoint Admin, users and architects.  This product brings that thought much closer to reality.  EMC Replication Manager for SharePoint 5.2 SP2.

A single application, central console, simple, easy. A storage guy, a DBA, a windows guy – they can all relate and understand it….

While the Blueprint documentation has not yet hit EMC.com, I share it here with you now.

h6600-backup-recovery-ms-sharepoint-clariion-cx4-replication-manager-ontrack-hyper-v-blueprint

For more information to what EMC can offer on SharePoint go to http://tinyurl.com/EMCMOSS
Here is a diagram of the production farm.

 

To illustrate the point, I have created 4 video demonstrations;

1) Creating that application protection (backup configuration & scheduling)

2) Running a backup against a very busy SharePoint farm (worse case scenario test)

3) Restoring a content database from a single user interface

4) Using the combination of Kroll Ontrack Powercontrols and EMC Replication Manager to simplify item-level recovery


Good Day to you..

June 30, 2009

Hello there.

Before I start writing ….

I want to set the stage here…
 I didn’t think I liked blogs, I actually thought they were a bit self-indulging!
   I certainly STILL know I dont like writing about myself!
     But, every day I now see how blogs _help_ people like you and me.  
       It is to me, a great way for people to distil information in a friendly format.

The reason for this blog is you

I feel like I am in a privileged position in what I do as a profession and I want to be able to share my experiences and information in order help you in your daily endeavours.  I had been asked before by people if I had a blog, why I didn’t and would I consider writing one.  I resisted, in hindsight, wrongly.  I said to myself, the next person who asks me, that will be the trigger-pull I need to start…..that was today…

To state clearly, I am not out to promote myself, some of my blog posts might be total hogwash or is not what you see in your environment (I want to hear about that), and some will indecently try to promote some of the technology my company has to offer.

My name is James Baldwin and I work in EMC Corporation, in Cork, Ireland.  I am  (wait for this title!) the EMC’s Global Solutions SQL and SharePoint Lead Engineer.  Waiting on the business cards, it will be a riot 🙂

As you can guess, I don’t take myself seriously, but I DO take what I do very seriously.  I suppose people call me, amoungst other things, an EMC and application evangelist, I actually perfer to call myself a customer evangelist.  The former falls into place.  That is far more important in my eyes, and I hope that reflects in my subsequent blog posts.

I started working in EMC in 2001, when the shares were still soaring and business class travel on flights was standard. 

Before that, I came from 3 years in a special wing of Dell engineering where we built custom or complex desktop and server builds.  All OS’es, all hardware, engineered the first Redhat 6.0.x orderable on PE server, drowned in OS/2 Warp for a very special customer for a bit, and importantly delved into all kinds of challenges which customers had.

If you really want to rewind further, I did a Bart Simpson on my dad to force him to buy me a Spectrum 48K at the age of 13.  Now he has his own back on me any time he has an “anomoly” with his home PC.  IT Karma.  I remember loading VMWare on Slackware 4.0, seeing my own PC booting inside itself, displaying a gammy pseudo Pheonix BIOS et al, saying to myself “Jeez that’ll never catch on!” 🙂  Think of the shares….Think of the shares….Forget the Sports Almanac, if I get a working DeLorean, I want the IT Almanac to go to 1996 with.  Enough of that…

I arrived into EMC in a technical support capacity, supporting their enterprise backup product at the time, EMC Data Manager (EDM), which ran on Solaris, slightly different beast Linux, but a great OS I must say for multi-threaded applications with some really well thought-out debugging tools.  We backed up everything, all mainstream applications, all mainstream OS’es.  I quickly understood that regardless of the severity of a call, absolutely nothing is trivial to a customer.

It may well be trivial to someone preaching the topic, but when you are the customer, responsible for a live user environment where a critical business application depends on you and your team member, it’s a whole lot more serious.  Go on, see if you can crack a joke with a customer who called you looking for help and guidance because their SAP instance is down and need to recover ASAP.  My record for affected users, 325,000.  Wont say why, who or how, but we got it fixed and afterwards figured out what went wrong, why and how to prevent it.  I must say, actually, I can easily say quality and customer focus was driven into us in technical support.  While the job was sometimes stressful, I loved every minute of that job due to the amount of satisfaction in helping solve problems for people.  That and the fact the next support call was like a box of chocolates…yes, the guy on the bench…

I remember having a customer call me directly and say “James, dont laugh, I just blicked SG2 on Exch-04”.  1,400 users.  These things happen every single day.

Along came technology, disk costs lowered, and this funky thing of point in time replication became a household name, well ok in the storage nerd’s house.
EMC Replication Manager came along and changed things for us in support. 

I changed role slightly and had more of a free hand in making things better for the support team in documentation, training and mentoring.  In this role, I now understood more.  I understood the customer’s problems, but as importantly, I understood the challenges of my fellow technical support people in dealing with such events, in gaining experience and knowledge and being able to apply it.  Just as we stabilized our perceptions on Replication Manager and these Virtual Tape Library (VTL) Units, along came RecoverPoint!   Again, another significant step in technology.  Time to write more “uncovered” documentation to help ourselves figure this stuff out and to chase engineering groups to make their products more sustainable.

I started to understand how technology could actually help us (you and me) in our daily grind.  Those poor customers who lost data once had to recovery from tape, could then recently recover from a point in time, could now recover to any point in time.  Ignore vendors, the story is there.  Technology was getting better.  In some circumstances, it was really helping, ie recovery.  But, technology was not helping in many respects.  Added complexity, the same old human traits (mistakes), mis-placement of technology, and so on.  Its there today and will be there tomorrow.

I began to travel to customer sites to assist on the ground with highly technical issues, nothing in a specific area, but more around what environment or “solution” the customer was operating in.  I now had the four angles.  The customer, the technical support person, the product, and the person who designed the environment (be that person a customer, fellow employee or a third party consultant).

I thought I had a pretty good idea and more importantly, an appreciation of what problems exist in the IT industry…now how could I apply my knowledge more effectively, to prevent those customer issues in the first place?…

Wind forward to today…
I joined the EMC Global Solutions Center in 2007, after 6 good years in support.  When I saw the job posting and read what it was about, I said to myself “this is for me”.  The position was for a SharePoint solutions engineer.

The Global Solutions Centers (6 of them scattered around the world) are engineering centers of people who take a given solution, design it, built it, test it, break it, analyze it and document it.  In a nutshell.  We hope to find bugs and when we do we work them out with the product owners (EMC, VMWare, Microsoft, Oracle, etc).  We test the environment to scale.  We document the best practices.  We document what not to do, and why not.  We try to take the guess work out of a solution for a customer.

“What do a need to run 240,000 heavy SharePoint users in a tiered environment on Hyper-V with rapid backup” – we answer these questions.

To apply that in context…
We work with customers, all EMC practice and engineering groups and application vendors (e.g. Microsoft, Oracle, SAP, etc) in trying to understand what the most common use cases are for certain applications and environments.  Through customer and field feedback, we then decide on what demands and requirements customers have on such an environment. 

My team is focused on SQL and SharePoint solutions, but dont let your questioning stop there.  Half the battle is knowing I am in a good team of multi-disciplined people, I’ll get the answer.

We go ahead and built out the racks, servers, network & fibre channel switches, storage etc in our customer integration labs.  “For this project, I’ll have a 6x 24-core servers please :-)”
Using industry-standard load generation tools, we performance iterative tests on a good datacenter day and on a bad armageddon datacenter day, functional testing, core switches dying, Clustered physical or virtual servers being unplugged (oops!), half the storage array going down – would never happen 🙂  You get the picture.

We test and profile the bad things, so you as a customer dont have to guess what if….

Remember, we test with two things in the forefront of our mind at all times – application and customer.
As a customer, I dont really care that an EMC switch, disk, cable or software component has died, I just care about my app.

We take that approach.  If we see something we don’t like, we let the relevent people know “You’ve got to fix this, don’t let a customer suffer this” is the message.
We are the “external” customer that product groups always wanted, but they dont really know it. 

I will share our experiences with you.  I will share “look what we found” with you in concise, technical details.  I want you to share your experiences with me.  The more I know about your struggles, the more I hopefully will be able to help.  I am entirely willing to test something you hit in our labs if we have the given time and capacity.

 

In summary, I’m here to help…

Thank you

James.