BigFix Server Client Report Ingestion Stalls at Midnight

From RiceFamily Wiki
Jump to: navigation, search

Summary

This has been resolved. There was a Scheduled SQL Job that was indexing the Fixlets and it was causing the server to stall each night. We disabled that Scheduled Job and it is no longer a problem.

We have been seeing a problem where every night, starting at midnight the all the clients appear to go "off line" (Last Report Time's all show no later than 12:00 Midnight) for a period of about an hour, then they start to report again.

If I look on the server, the BufferDir folder is full of client report files (several hundred files), but for some reason FillDB is not importing the client reports.

I have verified that all of the scheduled tasks (remove duplicate computers, etc) and scheduled SQL jobs (Re-Index DB currently temporarily disabled due to this issue) are NOT scheduled for midnight. Neither the server nor the database are currently being backed up. The server is part of a DSA pair structure. The only other thing I could think of was scheduled Web Reports, but those come from the WebReport database and run at random times of the day, not BFEnterprise. Same with SOAP queries.

To me, the symptoms look like some process somewhere is placing an Exclusive Lock on the BESEnterprise database for about an hour. We have asked our DB team to look at the server but they say that they can't see what is causing the issue.

We have Actions that run overnight (11:30pm to 6:00am) and the Console Operators that need to monitor these tasks are "blind" from midnight to about 1am each night.

Research

POTENTIAL Solution

Taken from https://stackoverflow.com/questions/8749426/how-to-find-out-what-is-locking-my-tables

 --Create Procedure WhoLock
 --AS
 set nocount on
 if object_id('tempdb..#locksummary') is not null Drop table #locksummary
 if object_id('tempdb..#lock') is not null Drop table #lock
 create table #lock (    spid int,    dbid int,    objId int,    indId int,    Type char(4),    resource nchar(32),    Mode char(8),    status char(6))
 Insert into #lock exec sp_lock
 if object_id('tempdb..#who') is not null Drop table #who
 create table #who (     spid int, ecid int, status char(30),
             loginame char(128), hostname char(128),
             blk char(5), dbname char(128), cmd char(16)
             --
             , request_id INT --Needed for SQL 2008 onwards
             --
          )
 Insert into #who exec sp_who
 Print '-----------------------------------------'
 Print 'Lock Summary for ' + @@servername  + ' (excluding tempdb):'
 Print '-----------------------------------------' + Char(10)
 Select     left(loginame, 28) as loginame, 
     left(db_name(dbid),128) as DB,
     left(object_name(objID),30) as object,
     max(mode) as [ToLevel],
     Count(*) as [How Many],
     Max(Case When mode= 'X' Then cmd Else null End) as [Xclusive lock for command],
     l.spid, hostname
 into #LockSummary
 from #lock l join #who w on l.spid= w.spid
 where dbID != db_id('tempdb') and l.status='GRANT'
 group by dbID, objID, l.spid, hostname, loginame
 
 Select * from #LockSummary order by [ToLevel] Desc, [How Many] Desc, loginame, DB, object
 
 Print '--------'
 Print 'Who is blocking:'
 Print '--------' + char(10)
 SELECT p.spid
 ,convert(char(12), d.name) db_name
 , program_name
 , p.loginame
 , convert(char(12), hostname) hostname
 , cmd
 , p.status
 , p.blocked
 , login_time
 , last_batch
 , p.spid
 FROM      master..sysprocesses p
 JOIN      master..sysdatabases d ON p.dbid =  d.dbid
 WHERE     EXISTS (  SELECT 1
           FROM      master..sysprocesses p2
           WHERE     p2.blocked = p.spid )
 
 Print '--------'
 Print 'Details:'
 Print '--------' + char(10)
 Select     left(loginame, 30) as loginame,  l.spid,
     left(db_name(dbid),15) as DB,
     left(object_name(objID),40) as object,
     mode ,
     blk,
     l.status
 from #lock l join #who w on l.spid= w.spid
 where dbID != db_id('tempdb') and blk <>0
 Order by mode desc, blk, loginame, dbID, objID, l.status